com.bigdata.btree
Class AbstractBTree

java.lang.Object
  extended by com.bigdata.btree.AbstractBTree
All Implemented Interfaces:
IAutoboxBTree, IIndex, ILinearList, IRangeQuery, ISimpleBTree
Direct Known Subclasses:
BTree, IndexSegment

public abstract class AbstractBTree
extends Object
implements IIndex, IAutoboxBTree, ILinearList

Base class for mutable and immutable B+-Tree implementations.

The B+-Tree implementation supports variable length unsigned byte[] keys and provides a IKeyBuilder utilities designed to make it possible to generate keys from any combination of primitive data types and Unicode strings. The total ordering imposed by the index is that of a bit-wise comparison of the variable length unsigned byte[] keys. Encoding Unicode keys is support by an integration with ICU4J and applications may choose the locale, strength, and other properties that govern the sort order of sort keys generated from Unicode strings. Sort keys produces by different collator objects are NOT compatible and applications that use Unicode data in their keys MUST make sure that they use a collator that imposes the same sort order each time they provision a KeyBuilder.

The use of variable length unsigned byte[] keys makes it possible for the B+-Tree to perform very fast comparison of a search key with keys in the nodes and leaves of the tree. To support fast search, the leading prefix is factored out each time a node or leaf is made immutable, e.g., directly proceeding serialization. Further, the separator keys are chosen to be the shortest separator key in order to further shorten the keys in the nodes of the tree.

The B+Tree can optionally maintain version metadata (a version timestamp and deletion marker). Version metadata MUST be enabled (a) if the index will be used with transactional isolation; or (b) if the index is part of a scale-out index. In both cases the version timestamps and deletion markers play a critical role when reading from a fused view of an ordered set of indices describing an index or an index partition. Note that the existence of deletion markers means that rangeCount(byte[], byte[]) and getEntryCount() are upper bounds as deleted entries will be reported until they are purged from the index by a compacting merge of the view.

Version:
$Id: AbstractBTree.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
See Also:
KeyBuilder

Field Summary
protected  int branchingFactor
          The branching factor for the btree.
protected  boolean debug
          Flag turns on the use of AbstractNode.assertInvariants() and is automatically enabled when the logger is set to Level.DEBUG.
protected static boolean DEBUG
          True iff the log level is DEBUG or less.
static org.apache.log4j.Logger dumpLog
          Log for dump(PrintStream) and friends.
protected static String ERROR_CLOSED
          The index is already closed.
protected static String ERROR_LESS_THAN_ZERO
          A parameter was less than zero.
protected static String ERROR_READ_ONLY
          The index is read-only but a mutation operation was requested.
protected static String ERROR_TOO_LARGE
          A parameter was too large.
protected static String ERROR_TRANSIENT
          The index is transient (not backed by persistent storage) but an operation that requires persistence was requested.
protected static boolean INFO
          True iff the log level is INFO or less.
protected static org.apache.log4j.Logger log
          Log for btree opeations.
protected  IndexMetadata metadata
          The metadata record for the index.
protected  int ndistinctOnWriteRetentionQueue
          The #of distinct nodes and leaves on the writeRetentionQueue.
protected  NodeSerializer nodeSer
          Used to serialize and de-serialize the nodes and leaves of the tree.
protected  AbstractNode<?> root
          The root of the btree.
protected  IRawStore store
          The persistence store -or- null iff the B+Tree is transient.
protected  IGlobalLRU.ILRUCache<Long,Object> storeCache
          Optional cache for INodeData and ILeafData instances and always null if the B+Tree is transient.
protected  HardReferenceQueue<PO> writeRetentionQueue
          Nodes (that is nodes or leaves) are added to a hard reference queue when they are created or read from the store.
 
Fields inherited from interface com.bigdata.btree.IRangeQuery
ALL, CURSOR, DEFAULT, DELETED, FIXED_LENGTH_SUCCESSOR, KEYS, NONE, PARALLEL, READONLY, REMOVEALL, REVERSE, VALS
 
Constructor Summary
protected AbstractBTree(IRawStore store, INodeFactory nodeFactory, boolean readOnly, IndexMetadata metadata, IRecordCompressorFactory<?> recordCompressorFactory)
           
 
Method Summary
protected abstract  void _reopen()
          This method is responsible for setting up the root leaf (either new or read from the store), the bloom filter, etc.
protected  void assertNotReadOnly()
           
protected  void assertNotTransient()
           
 void close()
          The contract for close() is to reduce the resource burden of the index (by discarding buffers) while not rendering the index inoperative.
 boolean contains(byte[] key)
          Core method to decide whether the index has a (non-deleted) entry under a key.
 boolean contains(Object key)
          Return true iff there is an entry for the key.
 boolean dump(org.apache.log4j.Level level, PrintStream out)
           
 boolean dump(PrintStream out)
          Recursive dump of the tree.
abstract  BloomFilter getBloomFilter()
          Return the optional IBloomFilter, transparently reopen()ing the index if necessary.
 int getBranchingFactor()
          The branching factor for the btree.
 BTreeCounters getBtreeCounters()
          Counters tracking various aspects of the btree.
 Tuple getContainsTuple()
          Return a Tuple that may be used to test for the existence of tuples having a given key within the AbstractBTree.
 ICounterSet getCounters()
          Return some "statistics" about the btree including both the static CounterSet and the BTreeCounterss.
 CounterSet getDynamicCounterSet()
          Return a new CounterSet containing dynamic counters - DO NOT hold onto the returned CounterSet as it contains implicit hard references to the AbstractBTree and will prevent the AbstractBTree from being finalized!
abstract  int getEntryCount()
          The #of entries (aka values) in the AbstractBTree.
abstract  int getHeight()
          The height of the btree.
 IndexMetadata getIndexMetadata()
          Returns the metadata record for this btree.
abstract  long getLastCommitTime()
          The timestamp associated with the last IAtomicStore.commit() in which writes buffered by this index were made restart-safe on the backing store.
abstract  int getLeafCount()
          The #of leaf nodes in the AbstractBTree.
 Tuple getLookupTuple()
          Return a Tuple that may be used to copy the value associated with a key out of the AbstractBTree.
abstract  int getNodeCount()
          The #of non-leaf nodes in the AbstractBTree.
 NodeSerializer getNodeSerializer()
          The object responsible for (de-)serializing the nodes and leaves of the IIndex.
 IResourceMetadata[] getResourceMetadata()
          The description of the resources comprising the index view.
 AbstractNode getRightMostNode(boolean nodesOnly)
          Utility method returns the right-most node in the B+Tree.
protected  AbstractNode<?> getRoot()
          The root of the btree.
protected  AbstractNode<?> getRootOrFinger(byte[] key)
          Returns the node or leaf to be used for search.
 CounterSet getStaticCounterSet()
          Returns a new CounterSet containing static counters.
abstract  IRawStore getStore()
          The backing store.
 int[] getUtilization()
          Computes and returns the utilization of the tree.
 Tuple getWriteTuple()
          Private instance used for mutation operations (insert, remove) which are single threaded.
 int indexOf(byte[] key)
          Lookup the index position of the key.
 byte[] insert(byte[] key, byte[] value)
          Insert or update a value under the key.
 Tuple insert(byte[] key, byte[] value, boolean delete, long timestamp, Tuple tuple)
          Core method for inserting or updating a value under a key.
 Object insert(Object key, Object value)
          Insert with auto-magic handling of keys and value objects.
 boolean isOpen()
          An "open" index has its buffers and root node in place rather than having to reallocate buffers or reload the root node from the store.
abstract  boolean isReadOnly()
          Return true iff this B+Tree is read-only.
 boolean isTransient()
          Return true iff this is a transient data structure (no backing store).
 byte[] keyAt(int index)
          Return the key for the identified entry.
 byte[] lookup(byte[] key)
          Lookup a value for a key.
 Tuple lookup(byte[] key, Tuple tuple)
          Core method for retrieving a value under a key.
 Object lookup(Object key)
          Lookup a value for a key.
abstract  ILeafCursor newLeafCursor(byte[] key)
          Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree.
abstract  ILeafCursor newLeafCursor(SeekEnum where)
          Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree.
protected  boolean rangeCheck(byte[] key, boolean allowUpperBound)
          Iff the B+Tree is an index partition then verify that the key lies within the key range of an index partition.
 long rangeCopy(IIndex src, byte[] fromKey, byte[] toKey, boolean overflow)
          Copy all data, including deleted index entry markers and timestamps iff supported by the source and target.
 long rangeCount()
          Return the #of tuples in the index.
 long rangeCount(byte[] fromKey, byte[] toKey)
          This method computes the #of entries in the half-open range using AbstractNode#indexOf(Object).
 long rangeCount(Object fromKey, Object toKey)
          Variant implicitly converts the optional application keys into unsigned byte[]s.
 long rangeCountExact(byte[] fromKey, byte[] toKey)
          Return the exact tuple count for the half-open key range.
 long rangeCountExactWithDeleted(byte[] fromKey, byte[] toKey)
          Note: rangeCount(byte[], byte[]) already reports deleted tuples for an AbstractBTree so this method is just delegated to that one.
 ITupleIterator rangeIterator()
          Visits all tuples in key order.
 ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey)
          Return an iterator that visits the entries in a half-open key range.
 ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey, int capacityIsIgnored, int flags, IFilterConstructor filter)
          Core implementation.
 ITupleIterator rangeIterator(Object fromKey, Object toKey)
          Variant implicitly converts the optional application keys into unsigned byte[]s.
 ITupleIterator rangeIterator(Object fromKey, Object toKey, int capacity, int flags, IFilterConstructor filter)
          Variant implicitly converts the optional application keys into unsigned byte[]s.
protected  AbstractNode<?> readNodeOrLeaf(long addr)
          Read a node or leaf from the store.
 byte[] remove(byte[] key)
          Remove the tuple under that key (will write a delete marker if delete markers are enabled).
 Tuple remove(byte[] key, Tuple tuple)
          Core method for deleting a value under a key.
 Object remove(Object key)
          Remove the key and its associated value.
abstract  void removeAll()
          Remove all entries in the B+Tree.
protected  void reopen()
          This is part of a close()/reopen() protocol that may be used to reduce the resource burden of an AbstractBTree.
 void setBTreeCounters(BTreeCounters btreeCounters)
          Replace the BTreeCounters.
 void submit(byte[] fromKey, byte[] toKey, IKeyRangeIndexProcedure proc, IResultHandler handler)
          The procedure will be transparently applied against each index partition spanned by the given key range.
 Object submit(byte[] key, ISimpleIndexProcedure proc)
          Submits an index procedure that operations on a single key to the appropriate index partition returning the result of that procedure.
 void submit(int fromIndex, int toIndex, byte[][] keys, byte[][] vals, AbstractKeyArrayIndexProcedureConstructor ctor, IResultHandler aggregator)
          Runs a procedure against an index.
 String toString()
          Fast summary information about the B+Tree.
protected  void touch(AbstractNode<?> node)
           Touch the node or leaf on the writeRetentionQueue.
 byte[] valueAt(int index)
          Return the value for the identified entry.
 Tuple valueAt(int index, Tuple tuple)
           
protected  long writeNodeOrLeaf(AbstractNode<?> node)
          Codes the node and writes the coded record on the store (non-recursive).
protected  void writeNodeRecursive(AbstractNode<?> node)
          Write a dirty node and its children using a post-order traversal that first writes any dirty leaves and then (recursively) their parent nodes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface com.bigdata.btree.IIndex
getCounter
 

Field Detail

ERROR_CLOSED

protected static final String ERROR_CLOSED
The index is already closed.

See Also:
Constant Field Values

ERROR_LESS_THAN_ZERO

protected static final String ERROR_LESS_THAN_ZERO
A parameter was less than zero.

See Also:
Constant Field Values

ERROR_TOO_LARGE

protected static final String ERROR_TOO_LARGE
A parameter was too large.

See Also:
Constant Field Values

ERROR_READ_ONLY

protected static final String ERROR_READ_ONLY
The index is read-only but a mutation operation was requested.

See Also:
Constant Field Values

ERROR_TRANSIENT

protected static final String ERROR_TRANSIENT
The index is transient (not backed by persistent storage) but an operation that requires persistence was requested.

See Also:
Constant Field Values

log

protected static final org.apache.log4j.Logger log
Log for btree opeations.


INFO

protected static final boolean INFO
True iff the log level is INFO or less.


DEBUG

protected static final boolean DEBUG
True iff the log level is DEBUG or less.


dumpLog

public static final org.apache.log4j.Logger dumpLog
Log for dump(PrintStream) and friends.


debug

protected final boolean debug
Flag turns on the use of AbstractNode.assertInvariants() and is automatically enabled when the logger is set to Level.DEBUG.


store

protected final IRawStore store
The persistence store -or- null iff the B+Tree is transient.


storeCache

protected final IGlobalLRU.ILRUCache<Long,Object> storeCache
Optional cache for INodeData and ILeafData instances and always null if the B+Tree is transient.


branchingFactor

protected final int branchingFactor
The branching factor for the btree.


root

protected volatile AbstractNode<?> root
The root of the btree. This is initially a leaf until the leaf is split, at which point it is replaced by a node. The root is also replaced each time copy-on-write triggers a cascade of updates.

This hard reference is cleared to null if an index is closed. getRoot() automatically uses reopen() to reload the root so that closed indices may be transparently made ready for further use (indices are closed to reduce their resource burden, not to make their references invalid). The AbstractNode and derived classes assume that the root is non-null. This assumption is valid if close() is invoked by the application in a manner consistent with the single-threaded contract for the AbstractBTree.

Note: This field MUST be marked as [volatile] in order to guarantee correct semantics for double-checked locking in reopen() and reopen().

See Also:
http://en.wikipedia.org/wiki/Double-checked_locking

nodeSer

protected final NodeSerializer nodeSer
Used to serialize and de-serialize the nodes and leaves of the tree.


writeRetentionQueue

protected final HardReferenceQueue<PO> writeRetentionQueue
Nodes (that is nodes or leaves) are added to a hard reference queue when they are created or read from the store. On eviction from the queue a dirty node is serialized by a listener against the IRawStore. The nodes and leaves refer to their parent with a WeakReferences. Likewise, nodes refer to their children with a WeakReference. The hard reference queue in combination with touch(AbstractNode) and with hard references held on the stack ensures that the parent and/or children remain reachable during operations. Once the node is no longer strongly reachable weak references to that node may be cleared by the VM - in this manner the node will become unreachable by navigation from its ancestors in the btree. The special role of the hard reference queue is to further ensure that dirty nodes remain dirty by defering persistence until the reference count for the node is zero during an eviction from the queue.

Note that nodes are evicted as new nodes are added to the hard reference queue. This occurs in two situations: (1) when a new node is created during a split of an existing node; and (2) when a node is read in from the store. Inserts on this hard reference queue always drive evictions. Incremental writes basically make it impossible for the commit set to get "too large" - the maximum #of nodes to be written is bounded by the size of the hard reference queue. This helps to ensure fast commit operations on the store.

The minimum capacity for the hard reference queue is two (2) so that a split may occur without forcing eviction of either node participating in the split.

Note: The code in Node.postOrderNodeIterator(boolean) and DirtyChildIterator MUST NOT touch the hard reference queue since those iterators are used when persisting a node using a post-order traversal. If a hard reference queue eviction drives the serialization of a node and we touch the hard reference queue during the post-order traversal then we break down the semantics of HardReferenceQueue.add(Object) as the eviction does not necessarily cause the queue to reduce in length. Another way to handle this is to have HardReferenceQueue.add(Object) begin to evict objects before is is actually at capacity, but that is also a bit fragile.

Note: The writeRetentionQueue uses a HardReferenceQueue. This is based on a RingBuffer and is very fast. It does not use a HashMap because we can resolve the WeakReference to the child Node or Leaf using top-down navigation as long as the Node or Leaf remains strongly reachable (that is, as long as it is on the writeRetentionQueue or otherwise strongly held). This means that lookup in a map is not required for top-down navigation.

The LRUNexus provides an INodeData / ILeafData data record cache based on a hash map with lookup by the address of the node or leaf. This is tested when the child WeakReference was never set or has been cleared. This cache is also used by the IndexSegment for the linked-leaf traversal pattern, which does not use top-down navigation.

TODO:
consider a policy that dynamically adjusts the queue capacities based on the height of the btree in order to maintain a cache that can contain a fixed percentage, e.g., 5% or 10%, of the nodes in the btree. The minimum and maximum size of the cache should be bounded. Bounding the minimum size gives better performance for small trees. Bounding the maximum size is necessary when the trees grow very large. (Partitioned indices may be used for very large indices and they can be distributed across a cluster of machines.)

There is a discussion of some issues regarding such a policy in the code inside of Node.Node(BTree btree, AbstractNode oldRoot, int nentries).


ndistinctOnWriteRetentionQueue

protected int ndistinctOnWriteRetentionQueue
The #of distinct nodes and leaves on the writeRetentionQueue.


metadata

protected IndexMetadata metadata
The metadata record for the index. This data rarely changes during the life of the BTree object, but it CAN be changed.

Constructor Detail

AbstractBTree

protected AbstractBTree(IRawStore store,
                        INodeFactory nodeFactory,
                        boolean readOnly,
                        IndexMetadata metadata,
                        IRecordCompressorFactory<?> recordCompressorFactory)
Parameters:
store - The persistence store.
nodeFactory - Object that provides a factory for node and leaf objects.
addrSer - Object that knows how to (de-)serialize the child addresses in an INodeData.
readOnly - true IFF it is known that the AbstractBTree is read-only.
metadata - The IndexMetadata object for this B+Tree.
Method Detail

getBtreeCounters

public final BTreeCounters getBtreeCounters()
Counters tracking various aspects of the btree.


setBTreeCounters

public final void setBTreeCounters(BTreeCounters btreeCounters)
Replace the BTreeCounters.

Note: This is used by the IndexManager to ensure that an index loaded from its backing store uses the BTreeCounters associated with that index since the DataService was last (re-)started.

Parameters:
btreeCounters - The counters to be used.
Throws:
IllegalArgumentException - if the argument is null.

getBloomFilter

public abstract BloomFilter getBloomFilter()
Return the optional IBloomFilter, transparently reopen()ing the index if necessary.

Returns:
The bloom filter -or- null if there is no bloom filter (including the case where there is a bloom filter but it has been disabled since the BTree has grown too large and the expected error rate of the bloom filter would be too high).

getCounters

public ICounterSet getCounters()
Return some "statistics" about the btree including both the static CounterSet and the BTreeCounterss.

Note: Since this DOES NOT include the getDynamicCounterSet(), holding a reference to the returned ICounterSet WILL NOT cause the AbstractBTree to remain strongly reachable.

Specified by:
getCounters in interface IIndex
See Also:
getStaticCounterSet(), getDynamicCounterSet(), BTreeCounters.getCounters()

getDynamicCounterSet

public CounterSet getDynamicCounterSet()
Return a new CounterSet containing dynamic counters - DO NOT hold onto the returned CounterSet as it contains implicit hard references to the AbstractBTree and will prevent the AbstractBTree from being finalized!

TODO:
factor counter names into interface.

getStaticCounterSet

public CounterSet getStaticCounterSet()
Returns a new CounterSet containing static counters. These counters are all OneShotInstruments and DO NOT contain implicit references to the AbstractBTree. They may be safely held and will not cause the AbstractBTree to remain strongly reachable.


close

public void close()
The contract for close() is to reduce the resource burden of the index (by discarding buffers) while not rendering the index inoperative. Unless the AbstractBTree isTransient(), a B+Tree that has been closed MAY be reopened at any time (conditional on the continued availability of the backing store). Such an index reference remains valid after a close(). A closed index is transparently restored by either getRoot() or reopen().

Note: A close() on a dirty index MUST discard writes rather than flushing them to the store and MUST NOT update its Checkpoint record - (close() is used to discard indices with partial writes when an AbstractTask fails). If you are seeking to close() a mutable BTree that it state can be recovered by reopen() then you MUST write a new Checkpoint record before closing the index.

Note: CLOSING A TRANSIENT INDEX WILL DISCARD ALL DATA!

This implementation clears the hard reference queue (releasing all node references), releases the hard reference to the root node, and releases the buffers on the NodeSerializer (they will be naturally reallocated on reuse).

Note: AbstractBTree is NOT thread-safe and close() MUST be invoked in a context in which there will not be concurrent threads -- the natural choice being the single-threaded commit service on the journal.

Throws:
IllegalStateException - if the root is null, indicating that the index is already closed.
IllegalStateException - if the root is dirty (this implies that this is a mutable btree and there are mutations that have not been written through to the store)

reopen

protected final void reopen()
This is part of a close()/reopen() protocol that may be used to reduce the resource burden of an AbstractBTree. The method delegates to _reopen() if double-checked locking demonstrates that the root is null (indicating that the index has been closed). This method is automatically invoked by a variety of methods that need to ensure that the index is available for use.

See Also:
close(), isOpen(), getRoot()

_reopen

protected abstract void _reopen()
This method is responsible for setting up the root leaf (either new or read from the store), the bloom filter, etc. It is invoked by reopen() once root has been show to be null with double-checked locking. When invoked in this context, the caller is guaranteed to hold a lock on this. This is done to ensure that at most one thread gets to re-open the index from the backing store.


isOpen

public final boolean isOpen()
An "open" index has its buffers and root node in place rather than having to reallocate buffers or reload the root node from the store.

Returns:
If the index is "open".
See Also:
close(), reopen(), getRoot()

isTransient

public final boolean isTransient()
Return true iff this is a transient data structure (no backing store).


assertNotTransient

protected final void assertNotTransient()

isReadOnly

public abstract boolean isReadOnly()
Return true iff this B+Tree is read-only.


assertNotReadOnly

protected final void assertNotReadOnly()
Throws:
UnsupportedOperationException - if the B+Tree is read-only.
See Also:
isReadOnly()

getLastCommitTime

public abstract long getLastCommitTime()
The timestamp associated with the last IAtomicStore.commit() in which writes buffered by this index were made restart-safe on the backing store. The lastCommitTime is set when the index is loaded from the backing store and updated after each commit. It is ZERO (0L) when BTree is first created and will remain ZERO (0L) until the BTree is committed. If the backing store does not support atomic commits, then this value will always be ZERO (0L).

Note: This is fixed for an IndexSegment.


getStore

public abstract IRawStore getStore()
The backing store.


getResourceMetadata

public final IResourceMetadata[] getResourceMetadata()
Description copied from interface: IIndex
The description of the resources comprising the index view.

Specified by:
getResourceMetadata in interface IIndex

getIndexMetadata

public IndexMetadata getIndexMetadata()
Returns the metadata record for this btree.

Note: If the B+Tree is read-only then the metadata object will be cloned to avoid potential modification. However, only a single cloned copy of the metadata record will be shared between all callers for a given instance of this class.

Specified by:
getIndexMetadata in interface IIndex
Returns:
The metadata record for this btree and never null.
TODO:
the clone once policy is a compromise between driving the read only semantics into the IndexMetadata object, cloning each time, and leaving it to the caller to clone as required. Either the former or the latter might be a better choice. FIXME if the metadata record is updated (and it can be) then we really need to invalidate the cloned metadata record also.

toString

public String toString()
Fast summary information about the B+Tree.

Overrides:
toString in class Object

rangeCheck

protected boolean rangeCheck(byte[] key,
                             boolean allowUpperBound)
Iff the B+Tree is an index partition then verify that the key lies within the key range of an index partition.

Note: An index partition is identified by IndexMetadata.getPartitionMetadata() returning non-null.

Note: This method is used liberally in asserts to detect errors that can arise is client code when an index partition is split, joined, or moved.

Parameters:
key - The key.
allowUpperBound - true iff the key represents an inclusive upper bound and thus must be allowed to be LTE to the right separator key for the index partition. For example, this would be true for the toKey parameter on rangeCount or rangeIterator methods.
Returns:
true always.
Throws:
IllegalArgumentException - if the key is null
KeyOutOfRangeException - if the key does not lie within the index partition.

getBranchingFactor

public final int getBranchingFactor()
The branching factor for the btree.


getHeight

public abstract int getHeight()
The height of the btree. The height is the #of levels minus one. A btree with only a root leaf has height := 0. A btree with a root node and one level of leaves under it has height := 1. Note that all leaves of a btree are at the same height (this is what is means for the btree to be "balanced"). Also note that the height only changes when we split or join the root node (a btree maintains balance by growing and shrinking in levels from the top rather than the leaves).


getNodeCount

public abstract int getNodeCount()
The #of non-leaf nodes in the AbstractBTree. This is zero (0) for a new btree.


getLeafCount

public abstract int getLeafCount()
The #of leaf nodes in the AbstractBTree. This is one (1) for a new btree.


getEntryCount

public abstract int getEntryCount()
The #of entries (aka values) in the AbstractBTree. This is zero (0) for a new B+Tree. Note that this value is tracked explicitly so it requires no IOs.

TODO:
this could be re-defined as the exact entry count if we tracked the #of deleted index entries and subtracted that from the total #of index entries before returning the result. The #of deleted index entries would be stored in the index Checkpoint record.

Since getEntryCount() is also used to give the total #of index enties (and we need that feature) so we need to either add another method with the appropriate semantics, add a boolean flag, or add a method returning the #of deleted entries, etc.


getNodeSerializer

public final NodeSerializer getNodeSerializer()
The object responsible for (de-)serializing the nodes and leaves of the IIndex.


getRoot

protected final AbstractNode<?> getRoot()
The root of the btree. This is initially a leaf until the leaf is split, at which point it is replaced by a node. The root is also replaced each time copy-on-write triggers a cascade of updates.

The hard reference to the root node is cleared if the index is closed. This method automatically reopen()s the index if it is closed, making it available for use.


getRootOrFinger

protected AbstractNode<?> getRootOrFinger(byte[] key)
Returns the node or leaf to be used for search. This implementation is aware of the #finger and will return it if the key lies within the finger.

Parameters:
key - The key.
Returns:
Either the root node of the tree or a recently touched leaf that is known to span the key.
TODO:
This is a place holder for finger(s) for hot spots in the B+Tree, but the finger(s) are currently disabled.

getWriteTuple

public final Tuple getWriteTuple()
Private instance used for mutation operations (insert, remove) which are single threaded. Both IRangeQuery.KEYS and IRangeQuery.VALS are requested so that indices which encode part of the application object within the key can recover the application object in insert(Object, Object) and remove(Object).

While the Tuple is not safe for use by concurrent threads, the mutation API is not safe for concurrent threads either.


getLookupTuple

public final Tuple getLookupTuple()
Return a Tuple that may be used to copy the value associated with a key out of the AbstractBTree.

While the returned Tuple is not safe for use by concurrent threads, each instance returned by this method is safe for use by the thread in which it was obtained.

See Also:
lookup(byte[], Tuple)

getContainsTuple

public final Tuple getContainsTuple()
Return a Tuple that may be used to test for the existence of tuples having a given key within the AbstractBTree.

The tuple does not copy either the keys or the values. Contains is implemented as a lookup operation that either return this tuple or null. When isolation is supported, the version metadata is examined to determine if the matching entry is flagged as deleted in which case contains() will report "false".

While the returned Tuple is not safe for use by concurrent threads, each instance returned by this method is safe for use by the thread in which it was obtained.

See Also:
contains(byte[])

insert

public final Object insert(Object key,
                           Object value)
Description copied from interface: IAutoboxBTree
Insert with auto-magic handling of keys and value objects.

Specified by:
insert in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
value - The value is implicitly converted to a byte[].
Returns:
The de-serialized old value -or- null if there was no value stored under that key.

insert

public final byte[] insert(byte[] key,
                           byte[] value)
Description copied from interface: ISimpleBTree
Insert or update a value under the key.

Specified by:
insert in interface ISimpleBTree
Parameters:
key - The key.
value - The value (may be null).
Returns:
The previous value under that key or null if the key was not found or if the previous entry for that key was marked as deleted.

insert

public final Tuple insert(byte[] key,
                          byte[] value,
                          boolean delete,
                          long timestamp,
                          Tuple tuple)
Core method for inserting or updating a value under a key.

Parameters:
key - The variable length unsigned byte[] key.
value - The variable length byte[] value (MAY be null).
delete - true iff the index entry should be marked as deleted (this behavior is supported iff the btree supports delete markers).
timestamp - The timestamp to be associated with the new or updated index entry (required iff the btree supports transactional isolation and otherwise 0L).
tuple - Data and metadata for the old value under the key will be copied onto this object (optional).
Returns:
tuple -or- null if there was no entry under that key. See ITuple.isDeletedVersion() to determine whether or not the entry is marked as deleted.
Throws:
UnsupportedOperationException - if the index is read-only.
TODO:
add putIfAbsent() variant for insert methods?

remove

public final Object remove(Object key)
Description copied from interface: IAutoboxBTree
Remove the key and its associated value.

Specified by:
remove in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
Returns:
The de-serialized value stored under that key or null if the key was not found.

remove

public final byte[] remove(byte[] key)
Remove the tuple under that key (will write a delete marker if delete markers are enabled).

Specified by:
remove in interface ISimpleBTree
Parameters:
key - The key.
Returns:
The value stored under that key or null if the key was not found or if the previous entry under that key was marked as deleted.

remove

public final Tuple remove(byte[] key,
                          Tuple tuple)
Core method for deleting a value under a key. If there is an entry under the key then it is removed from the index. It is an error to use this method if delete markers are being maintained. remove(byte[]) uses insert(byte[], byte[], boolean, long, Tuple) instead of this method to mark the index entry as deleted when delete markers are being maintained.

Note: removing a key has no effect on the optional bloom filter. If a key is removed from the index by this method then the bloom filter will report a false positive for that key, which will be detected when we test the index itself. This works out fine in the scale-out design since the bloom filter is per AbstractBTree instance and split/join/move operations all result in new mutable BTrees with new bloom filters to absorb new writes. For a non-scale-out deployement, this can cause the performance of the bloom filter to degrade if you are removing a lot of keys. However, in the special case of a BTree that does NOT use delete markers, BTree.removeAll() will create a new root leaf and a new (empty) bloom filter as well.

Parameters:
key - The search key.
tuple - An object that will be used to report data and metadata for the pre-existing version (optional).
Returns:
tuple or null if there was no version under that key.
Throws:
UnsupportedOperationException - if delete markers are being maintained.
UnsupportedOperationException - if the index is read-only.

removeAll

public abstract void removeAll()
Remove all entries in the B+Tree.

Note: The IIndexManager defines methods for registering (adding) and dropping indices vs removing the entries in an individual AbstractBTree.


lookup

public Object lookup(Object key)
Description copied from interface: IAutoboxBTree
Lookup a value for a key.

Specified by:
lookup in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
Returns:
The de-serialized value or null if there is no entry for that key.

lookup

public byte[] lookup(byte[] key)
Description copied from interface: ISimpleBTree
Lookup a value for a key.

Specified by:
lookup in interface ISimpleBTree
Returns:
The value stored under that key or null if there is no entry for that key or if the entry under that key is marked as deleted.

lookup

public Tuple lookup(byte[] key,
                    Tuple tuple)
Core method for retrieving a value under a key. This method allows you to differentiate an index entry whose value is null from a missing index entry or (when delete markers are enabled) from a deleted index entry. Applies the optional bloom filter if it exists. If the bloom filter exists and reports true, then looks up the value for the key in the index (note that the key might not exist in the index since a bloom filter allows false positives, further the key might exist for a deleted entry).

Parameters:
key - The search key.
tuple - An object that will be used to report data and metadata for the pre-existing version (required).
Returns:
tuple or null if there is no entry in the index under the key.

contains

public boolean contains(Object key)
Description copied from interface: IAutoboxBTree
Return true iff there is an entry for the key.

Specified by:
contains in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
Returns:
True if the btree contains an entry for that key.

contains

public boolean contains(byte[] key)
Core method to decide whether the index has a (non-deleted) entry under a key. Applies the optional bloom filter if it exists. If the bloom filter reports true, then verifies that the key does in fact exist in the index.

Specified by:
contains in interface ISimpleBTree
Parameters:
key - The key.
Returns:
true iff the key does not exist. Or, if the btree supports isolation, if the key exists but it is marked as "deleted".
TODO:
add unit test to btree suite w/ and w/o delete markers.

indexOf

public int indexOf(byte[] key)
Description copied from interface: ILinearList
Lookup the index position of the key.

Note that ILinearList.indexOf(byte[]) is the basis for implementing the IRangeQuery interface.

Specified by:
indexOf in interface ILinearList
Parameters:
key - The key.
Returns:
The index of the search key, if found; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be found it it were inserted into the btree without intervening mutations. Note that this guarantees that the return value will be >= 0 if and only if the key is found. When found the index will be in [0:nentries). Adding or removing entries in the tree may invalidate the index.

pos = -(pos+1) will convert an insertion point to the index at which the key would be found if it were inserted - this is also the index of the predecessor of key in the index.

See Also:
ILinearList.keyAt(int), ILinearList.valueAt(int)

keyAt

public byte[] keyAt(int index)
Description copied from interface: ILinearList
Return the key for the identified entry. This performs an efficient search whose cost is essentially the same as ISimpleBTree.lookup(byte[]).

Specified by:
keyAt in interface ILinearList
Parameters:
index - The index position of the entry (origin zero).
Returns:
The key at that index position (not a copy).
See Also:
#indexOf(Object), ILinearList.valueAt(int)

valueAt

public byte[] valueAt(int index)
Description copied from interface: ILinearList
Return the value for the identified entry. This performs an efficient search whose cost is essentially the same as ISimpleBTree.lookup(byte[]).

Specified by:
valueAt in interface ILinearList
Parameters:
index - The index position of the entry (origin zero).
Returns:
The value at that index position -or- null if there is a deleted entry at that index position then
See Also:
#indexOf(Object), ILinearList.keyAt(int)

valueAt

public final Tuple valueAt(int index,
                           Tuple tuple)

rangeCountExact

public final long rangeCountExact(byte[] fromKey,
                                  byte[] toKey)
Return the exact tuple count for the half-open key range.

Note: When the index uses delete markers this requires a key-range scan. If delete markers are not being used, then the cost is equal to the cost of rangeCount(byte[], byte[]).

Specified by:
rangeCountExact in interface IRangeQuery
Parameters:
fromKey - The lowest key that will be counted (inclusive). When null there is no lower bound.
toKey - The first key that will not be counted (exclusive). When null there is no upper bound.
Returns:
The exact #of tuples in the half-open key range.

rangeCount

public final long rangeCount()
Description copied from interface: IRangeQuery
Return the #of tuples in the index.

Note: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.

Specified by:
rangeCount in interface IRangeQuery
Returns:
The #of tuples in the index.

rangeCount

public final long rangeCount(Object fromKey,
                             Object toKey)
Variant implicitly converts the optional application keys into unsigned byte[]s.

Parameters:
fromKey -
toKey -
Returns:

rangeCount

public final long rangeCount(byte[] fromKey,
                             byte[] toKey)
This method computes the #of entries in the half-open range using AbstractNode#indexOf(Object). Since it does not scan the tuples it can not differentiate between deleted and undeleted tuples for an index that supports delete markers, but the result will be exact if delete markers are not being used. The cost is equal to the cost of lookup of the both keys. If both keys are null, then the cost is zero (no IOs).

Specified by:
rangeCount in interface IRangeQuery
Parameters:
fromKey - The lowest key that will be counted (inclusive). When null there is no lower bound.
toKey - The first key that will not be counted (exclusive). When null there is no upper bound.
Returns:
The #of tuples in the half-open key range.

rangeCountExactWithDeleted

public long rangeCountExactWithDeleted(byte[] fromKey,
                                       byte[] toKey)
Note: rangeCount(byte[], byte[]) already reports deleted tuples for an AbstractBTree so this method is just delegated to that one. However, FusedView.rangeCountExactWithDeleted(byte[], byte[]) treats this case differently since it must report the exact range count when considering all sources at once. It uses a range iterator scan visiting both deleted and undeleted tuples for that.

Specified by:
rangeCountExactWithDeleted in interface IRangeQuery
Parameters:
fromKey - The lowest key that will be counted (inclusive). When null there is no lower bound.
toKey - The first key that will not be counted (exclusive). When null there is no upper bound.
Returns:
The exact #of deleted and undeleted tuples in the half-open key range.
See Also:
IRangeQuery.rangeCountExact(byte[], byte[])

rangeIterator

public final ITupleIterator rangeIterator()
Description copied from interface: IRangeQuery
Visits all tuples in key order. This is identical to
 rangeIterator(null, null)
 

Specified by:
rangeIterator in interface IRangeQuery
Returns:
An iterator that will visit all entries in key order.

rangeIterator

public final ITupleIterator rangeIterator(Object fromKey,
                                          Object toKey)
Variant implicitly converts the optional application keys into unsigned byte[]s.

Parameters:
fromKey -
toKey -
Returns:

rangeIterator

public final ITupleIterator rangeIterator(byte[] fromKey,
                                          byte[] toKey)
Description copied from interface: IRangeQuery
Return an iterator that visits the entries in a half-open key range. When toKey EQ fromKey nothing will be visited. It is an error if toKey LT fromKey.

Specified by:
rangeIterator in interface IRangeQuery
Parameters:
fromKey - The first key that will be visited (inclusive lower bound). When null there is no lower bound.
toKey - The first key that will NOT be visited (exclusive upper bound). When null there is no upper bound.
See Also:
SuccessorUtil, which may be used to compute the successor of a value before encoding it as a component of a key., BytesUtil#successor(byte[]), which may be used to compute the successor of an encoded key., EntryFilter, which may be used to filter the entries visited by the iterator.

rangeIterator

public final ITupleIterator rangeIterator(Object fromKey,
                                          Object toKey,
                                          int capacity,
                                          int flags,
                                          IFilterConstructor filter)
Variant implicitly converts the optional application keys into unsigned byte[]s.

Parameters:
fromKey -
toKey -
Returns:

rangeIterator

public ITupleIterator rangeIterator(byte[] fromKey,
                                    byte[] toKey,
                                    int capacityIsIgnored,
                                    int flags,
                                    IFilterConstructor filter)
Core implementation.

Note: If IRangeQuery.CURSOR is specified the returned iterator supports traversal with concurrent modification by a single-threaded process (the BTree is NOT thread-safe for writers). Write are permitted iff AbstractBTree allows writes.

Note: IRangeQuery.REVERSE is handled here by wrapping the underlying ITupleCursor.

Note: IRangeQuery.REMOVEALL is handled here by wrapping the iterator.

Note: FusedView.rangeIterator(byte[], byte[], int, int, IFilterConstructor) is also responsible for constructing an ITupleIterator in a manner similar to this method. If you are updating the logic here, then check the logic in that method as well!

Specified by:
rangeIterator in interface IRangeQuery
Parameters:
fromKey - The first key that will be visited (inclusive lower bound). When null there is no lower bound.
toKey - The first key that will NOT be visited (exclusive upper bound). When null there is no upper bound.
capacityIsIgnored - The #of entries to buffer at a time. This is a hint and MAY be zero (0) to use an implementation specific default capacity. A non-zero value may be used if you know that you want at most N results or if you want to override the default #of results to be buffered before sending them across a network interface. (Note that you can control the default value using IBigdataClient.Options#DEFAULT_CLIENT_RANGE_QUERY_CAPACITY).
flags - A bitwise OR of IRangeQuery.KEYS, IRangeQuery.VALS, etc.
filter - An optional object used to construct a stacked iterator. When IRangeQuery.CURSOR is specified in flags, the base iterator will implement ITupleCursor and the first filter in the stack can safely cast the source iterator to an ITupleCursor. If the outermost filter in the stack does not implement ITupleIterator, then it will be wrapped an ITupleIterator.
See Also:
SuccessorUtil, which may be used to compute the successor of a value before encoding it as a component of a key., BytesUtil#successor(byte[]), which may be used to compute the successor of an encoded key., IFilterConstructor, which may be used to construct an iterator stack performing filtering or other operations.
TODO:
add support to the iterator construct for filtering by a tuple revision timestamp range.

rangeCopy

public long rangeCopy(IIndex src,
                      byte[] fromKey,
                      byte[] toKey,
                      boolean overflow)
Copy all data, including deleted index entry markers and timestamps iff supported by the source and target. The goal is an exact copy of the data in the source btree.

Parameters:
src - The source index.
fromKey - The first key that will be copied (inclusive). When null there is no lower bound.
toKey - The first key that will NOT be copied (exclusive). When null there is no upper bound.
overflow - When true the IOverflowHandler defined for the source index (if any) will be applied to copy raw records onto the backing store for this index. This value SHOULD be false if the two indices have the same backing store and true otherwise.
Returns:
The #of index entries that were copied.
Throws:
UnsupportedOperationException - if the src and this do not have the same setting for IndexMetadata.getDeleteMarkers()
UnsupportedOperationException - if the src and this do not have the same setting for IndexMetadata.getVersionTimestamps()
See Also:
CompactTask, OverflowManager
TODO:
this would be more efficient if we could reuse the same buffer for keys and values in and out. As it stands it does a lot of byte[] allocation., write tests for all variations (delete markers, timestamps, overflow handler, etc).

submit

public Object submit(byte[] key,
                     ISimpleIndexProcedure proc)
Description copied from interface: IIndex
Submits an index procedure that operations on a single key to the appropriate index partition returning the result of that procedure.

Specified by:
submit in interface IIndex
Parameters:
key - The key.
proc - The procedure.
Returns:
The value returned by IIndexProcedure.apply(IIndex)

submit

public void submit(byte[] fromKey,
                   byte[] toKey,
                   IKeyRangeIndexProcedure proc,
                   IResultHandler handler)
Description copied from interface: IIndex
The procedure will be transparently applied against each index partition spanned by the given key range.

Note: Since this variant of submit() does not split keys the fromIndex and toIndex in the Splits reported to the IResultHandler will be zero (0).

Specified by:
submit in interface IIndex
Parameters:
fromKey - The lower bound (inclusive) -or- null if there is no lower bound.
toKey - The upper bound (exclusive) -or- null if there is no upper bound.
proc - The procedure. If the procedure implements the IParallelizableIndexProcedure marker interface then it MAY be executed in parallel against the relevant index partition(s).

submit

public void submit(int fromIndex,
                   int toIndex,
                   byte[][] keys,
                   byte[][] vals,
                   AbstractKeyArrayIndexProcedureConstructor ctor,
                   IResultHandler aggregator)
Description copied from interface: IIndex
Runs a procedure against an index.

Note: This may be used to send custom logic together with the data to a remote index or index partition. When the index is remote both the procedure and the return value MUST be Serializable.

Note: The scale-out indices add support for auto-split of the procedure such that it runs locally against each relevant index partition.

Specified by:
submit in interface IIndex
Parameters:
fromIndex - The index of the first key to be used (inclusive).
toIndex - The index of the last key to be used (exclusive).
keys - The keys (required).
vals - The values (optional depending on the procedure).
ctor - An object that can create instances of the procedure.
aggregator - When defined, results from each procedure application will be reported to this object.

getRightMostNode

public AbstractNode getRightMostNode(boolean nodesOnly)
Utility method returns the right-most node in the B+Tree.

Parameters:
nodesOnly - when true the search will halt at the right-most non-leaf. Otherwise it will return the right-most leaf.
Returns:
The right-most child in the B+Tree -or- null if the root of the B+Tree is a leaf and nodesOnly == true.

newLeafCursor

public abstract ILeafCursor newLeafCursor(SeekEnum where)
Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree. The cursor will be initially positioned on the leaf identified by the symbolic constant.


newLeafCursor

public abstract ILeafCursor newLeafCursor(byte[] key)
Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree. The cursor will be initially positioned on the leaf that spans the given key.

Parameters:
key - The key (required).
Throws:
IllegalArgumentException - if key is null.

getUtilization

public int[] getUtilization()
Computes and returns the utilization of the tree. The utilization figures do not factor in the space requirements of nodes and leaves.

Returns:
An array whose elements are:
  • 0 - the leaf utilization percentage [0:100]. The leaf utilization is computed as the #of values stored in the tree divided by the #of values that could be stored in the #of allocated leaves.
  • 1 - the node utilization percentage [0:100]. The node utilization is computed as the #of non-root nodes divided by the #of non-root nodes that could be addressed by the tree.
  • 2 - the total utilization percentage [0:100]. This is the average of the leaf utilization and the node utilization.

dump

public boolean dump(PrintStream out)
Recursive dump of the tree.

Parameters:
out - The dump is written on this stream.
Returns:
true unless an inconsistency is detected.

dump

public boolean dump(org.apache.log4j.Level level,
                    PrintStream out)

touch

protected void touch(AbstractNode<?> node)

Touch the node or leaf on the writeRetentionQueue. If the node is not found on a scan of the head of the queue, then it is appended to the queue and its AbstractNode.referenceCount is incremented. If a node is being appended to the queue and the queue is at capacity, then this will cause a reference to be evicted from the queue. If the reference counter for the evicted node or leaf is zero and the evicted node or leaf is dirty, then the evicted node or leaf will be coded and written onto the backing store. A subsequent attempt to modify the node or leaf will force copy-on-write for that node or leaf. Regardless of whether or not the node or leaf is dirty, it is touched on the LRUNexus#getGlobalLRU() when it is evicted from the write retention queue.

This method guarantees that the specified node will NOT be synchronously persisted as a side effect and thereby made immutable. (Of course, the node may be already immutable.)

In conjunction with DefaultEvictionListener, this method guarentees that the reference counter for the node will reflect the #of times that the node is actually present on the writeRetentionQueue.

Parameters:
node - The node or leaf.
TODO:
review the guarentees offered by this method under the assumption of concurrent readers. See http://www.ibm.com/developerworks/java/library/j-jtp06197.html for some related issues. Among them ++ and -- are not atomic unless you are single-threaded, so this code probably needs to use synchronized(){} or the reference counts could be wrong. Since the reference counts only effect when a node is made persistent (assuming its dirty) and since we already require single threaded access for writes on the btree, there may not actually be a problem here. The reference counts could only be wrong for concurrent readers, and nodes are never dirty and hence will never be made persistent on eviction so it probably does not matter if the reference counts are over or under for this case.

Note: I have since seen an exception where the evicted node reference was [null]. The problem is that the HardReferenceQueue is NOT thread-safe. Concurrent readers driving HardReferenceQueue.add(Object) can cause the data structure to become inconsistent. This is not much of a problem for readers since the queue is being used to retain hard references - effectively a cache and the null reference could be ignored. For writers, we are always single threaded.

Still, it seems best to either make this method synchronized or to replace the HardReferenceQueue with an ArrayBlockingQueue since there could well be side-effects of concurrent appends on the queue with concurrent writers. The latency for synchronization will be short for readers since eviction is a NOP while the latency of append can be high for a single threaded writer since IO may result.

The use of an ArrayBlockingQueue opens up some interesting possibilities since a concurrent thread could now handle serialization and eviction of nodes while the caller was blocked and the caller would not need to wait for serialization and IO unless the queue was full. This really sounds like the same old concept of a write retention queue (we do not want to evict nodes too quickly or we will make them persistent while they are still likely to be touched) and a read retention queue (keeping nodes around for a while after they have been made persistent) with the new twist being that serialization and IO could be asynchronous in the zone of action available between those queues in order to keep the write reference queue at a minimium capacity).


writeNodeRecursive

protected final void writeNodeRecursive(AbstractNode<?> node)
Write a dirty node and its children using a post-order traversal that first writes any dirty leaves and then (recursively) their parent nodes. The parent nodes are guarenteed to be dirty if there is a dirty child so this never triggers copy-on-write. This is used as part of the commit protocol where it is invoked with the root of the tree, but it may also be used to incrementally flush dirty non-root Nodes. Note: This will throw an exception if the backing store is read-only.

Parameters:
node - The root of the hierarchy of nodes to be written. The node MUST be dirty. The node this does NOT have to be the root of the tree and it does NOT have to be a Node.

writeNodeOrLeaf

protected long writeNodeOrLeaf(AbstractNode<?> node)
Codes the node and writes the coded record on the store (non-recursive). The node MUST be dirty. If the node has a parent, then the parent is notified of the persistent identity assigned to the node by the store. This method is NOT recursive and dirty children of a node will NOT be visited. By coding the nodes and leaves as they are evicted from the writeRetentionQueue, the B+Tree continuously converts nodes and leaves to their more compact coded record forms which results in a smaller in memory footprint.

Note: For a transient B+Tree, this merely codes the node but does not write the node on the store (there is none).

Returns:
The persistent identity assigned by the store.
Throws:
UnsupportedOperationException - if the B+Tree (or the backing store) is read-only.

readNodeOrLeaf

protected AbstractNode<?> readNodeOrLeaf(long addr)
Read a node or leaf from the store.

Note: Callers SHOULD be synchronized in order to ensures that only one thread will read the desired node or leaf in from the store and attach the reference to the newly read node or leaf as appropriate into the existing data structures (e.g., as the root reference or as a child of a node or leaf already live in memory).

Parameters:
addr - The address in the store.
Returns:
The node or leaf.


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.