com.bigdata.btree
Class Node

java.lang.Object
  extended by com.bigdata.btree.PO
      extended by com.bigdata.btree.AbstractNode<Node>
          extended by com.bigdata.btree.Node
All Implemented Interfaces:
IAbstractNodeData, IChildData, IKeysData, INodeData, ISpannedTupleCountData, ITreeNodeData, IAbstractNode, IDirty, IIdentityAccess, IDataRecordAccess
Direct Known Subclasses:
IndexSegment.ImmutableNodeFactory.ImmutableNode

public class Node
extends AbstractNode<Node>
implements INodeData

A non-leaf node.

Per-child min/max revision timestamps and timestamp revision filtering

In order to track the min/max timestamp on the Node we must also track the min/max timestamp for each direct child of that Node. While this inflates the size of the INodeData data record considerably, we are required to track those per-child data in order to avoid a scan of the children when we need to recompute the min/max timestamp for the Node . The IO latency costs of that scan are simply not acceptable, especially for large branching factors. The min/max timestamp on the Node is ONLY used for filtering iterators based on a desired tuple revision range. This is why the choice to support tuple revision filters is its own configuration option. FIXME An alternative to per-child min/max tuple revision timestamps would be the concurrent materialization of the direct children. These data are only mutable for B+Tree instances with relatively small branching factors. They are immutable for the IndexSegment. However, the per-Node min/max timestamp also make the tuple revision filtering more efficient since we can prune the search before we materialize the child.

Version:
$Id: Node.java 4610 2011-06-02 19:58:46Z thompsonbry $
Author:
Bryan Thompson

Field Summary
 
Fields inherited from class com.bigdata.btree.AbstractNode
btree, INFO, log, parent, referenceCount, self
 
Fields inherited from class com.bigdata.btree.PO
deleted, dirty, identity
 
Fields inherited from interface com.bigdata.btree.IIdentityAccess
NULL
 
Constructor Summary
protected Node(AbstractBTree btree, long addr, INodeData data)
          De-serialization constructor.
protected Node(BTree btree)
          Used to create a new node when a node is split.
protected Node(BTree btree, AbstractNode oldRoot, long nentries)
          This constructor is used when splitting the a root Leaf or a root Node.
protected Node(Node src, long triggeredByChildId)
          Copy constructor.
 
Method Summary
 Iterator<AbstractNode> childIterator(boolean dirtyNodesOnly)
          Iterator visits the direct child nodes in the external key ordering.
 Iterator<AbstractNode> childIterator(byte[] fromKey, byte[] toKey)
          Iterator visits the direct child nodes in the external key ordering.
 AbstractFixedByteArrayBuffer data()
          The coded (aka compressed) data.
 void delete()
          Deletes the persistence capable object.
 boolean dump(org.apache.log4j.Level level, PrintStream out, int height, boolean recursive)
          Dump the data onto the PrintStream.
protected  int findChild(byte[] searchKey)
          Return the index of the child to be searched.
 AbstractNode getChild(int index)
          Return the child node or leaf at the specified index in this node.
 long getChildAddr(int index)
          Return the persistent addresses of the specified child node.
 int getChildCount()
          The #of children of this node.
 long getChildEntryCount(int index)
          Return the #of tuples spanned by the indicated child of this node.
 Reference<AbstractNode<?>> getChildRef(int index)
          Return the Reference for the child.
 INodeData getDelegate()
          Return the delegate IAbstractNodeData object.
protected  int getIndexOf(AbstractNode child)
          Return the index of the child among the direct children of this node.
 int getKeyCount()
          Return the #of keys in the node or leaf.
 IRaba getKeys()
          The object used to contain and manage the keys.
protected  AbstractNode getLeftSibling(AbstractNode child, boolean materialize)
          Return the left sibling.
 long getMaximumVersionTimestamp()
          The most recent tuple revision timestamp associated with any tuple spanned by this node or leaf.
 long getMinimumVersionTimestamp()
          The earliest tuple revision timestamp associated with any tuple spanned by this node or leaf.
protected  AbstractNode getRightMostChild(boolean nodesOnly)
          Return the right-most child of this node.
protected  AbstractNode getRightSibling(AbstractNode child, boolean materialize)
          Return the right sibling of the specified child of a common parent.
 long getSpannedTupleCount()
          The #of tuples spanned by this node.
 boolean hasVersionTimestamps()
          Return true iff the leaves maintain tuple revision timestamps.
 long indexOf(byte[] key)
          Recursive search locates the appropriate leaf and returns the index position of the entry.
 Tuple insert(byte[] key, byte[] value, boolean delete, long timestamp, Tuple tuple)
          Insert or update a value.
protected  void insertChild(byte[] key, AbstractNode child)
          Invoked by AbstractNode.split() to insert a key and reference for a child created when another child of this node is split.
 boolean isCoded()
          true iff this is a coded data structure.
 boolean isLeaf()
          Always returns false.
 boolean isReadOnly()
          The result depends on the backing INodeData implementation.
 byte[] keyAt(long entryIndex)
          Recursive search for the key at the specified entry index.
 Tuple lookup(byte[] key, Tuple tuple)
          Lookup a key.
protected  int maxKeys()
          Return branchingFactor - 1, which is the maximum #of keys for a Node.
protected  void merge(AbstractNode sibling, boolean isRightSibling)
          Merge the keys and values from the sibling into this node, delete the sibling from the store and remove the sibling from the parent.
protected  int minKeys()
          Return ((branchingFactor + 1) << 1) - 1
 Iterator<AbstractNode> postOrderIterator(byte[] fromKey, byte[] toKey)
          Iterator visits children in the specified half-open key range, recursively expanding each child with a post-order traversal of its children and finally visits this node itself.
 Iterator<AbstractNode> postOrderNodeIterator(boolean dirtyNodesOnly, boolean nodesOnly)
          Iterator visits children, recursively expanding each child with a post-order traversal of its children and finally visits this node itself.
protected  void prefetchChildLeaves(Node node, byte[] fromKey, byte[] toKey)
          When we visit a node whose children are leaves, schedule memoization of those leaves whose separator key in the node is LT the toKey (non-blocking).
protected  void prefetchRightSibling(Node node, byte[] toKey)
          If the caller's toKey is GT the separator keys for the children of this node then the iterator will need to visit the rightSibling of the node and this method will schedule the memoization of the node's rightSibling in order to reduce the IO latency when the iterator visit that rightSibling (non-blocking).
protected  boolean rangeCheckChildIndex(int index)
          Range check a child index.
protected  boolean rangeCheckSpannedTupleIndex(long entryIndex)
          Range check an index into the keys of the node.
protected  void redistributeKeys(AbstractNode sibling, boolean isRightSibling)
          Redistributes a key from the specified sibling into this node in order to bring this node up to the minimum #of keys.
 Tuple remove(byte[] key, Tuple tuple)
          Recursive search locates the appropriate leaf and removes the entry for the key.
protected  void removeChild(AbstractNode child)
          Invoked when a non-root node or leaf has no more keys to detach the child from its parent.
protected  IAbstractNode split()
           Split an over-capacity node (a node with maxKeys+1 keys), creating a new rightSibling.
 String toString()
          Human readable representation of the Node.
protected  void updateEntryCount(AbstractNode<?> child, long delta)
          Apply the delta to the per-child count for this node and then recursively ascend up the tree applying the delta to all ancestors of this node.
protected  void updateMinMaxVersionTimestamp(AbstractNode<?> child)
          Update the getMinimumVersionTimestamp() and getMaximumVersionTimestamp().
 void valueAt(long entryIndex, Tuple tuple)
          Recursive search for the value at the specified entry index.
 
Methods inherited from class com.bigdata.btree.AbstractNode
assertInvariants, assertKeysMonotonic, copyKey, copyOnWrite, copyOnWrite, dump, dump, entryIterator, getBranchingFactor, getParent, isLeftMostNode, isRightMostNode, join, keyAsString, postOrderNodeIterator, postOrderNodeIterator, rangeIterator
 
Methods inherited from class com.bigdata.btree.PO
getIdentity, indent, isDeleted, isDirty, isPersistent, setDirty, setIdentity, toShortString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Node

protected Node(AbstractBTree btree,
               long addr,
               INodeData data)
De-serialization constructor.

Note: The de-serialization constructor (and ONLY the de-serialization constructor) ALWAYS creates a clean node. Therefore the PO.dirty flag passed up from this constructor has the value false.

Parameters:
btree - The tree to which the node belongs.
addr - The persistent identity of the node.
data - The data record.

Node

protected Node(BTree btree)
Used to create a new node when a node is split.


Node

protected Node(BTree btree,
               AbstractNode oldRoot,
               long nentries)
This constructor is used when splitting the a root Leaf or a root Node. The resulting node has a single child reference and NO keys. The #of entries allocated to the child is the #of remaining in that child after the split.

Parameters:
btree - A mutable btree.
oldRoot - The node that was previously the root of the tree (either a node or a leaf).
nentries - The #of entries spanned by the oldRoot before the split.

Node

protected Node(Node src,
               long triggeredByChildId)
Copy constructor.

Parameters:
src - The source node (must be immutable).
triggeredByChildId - The persistent identity of the child that triggered the copy constructor. This should be the immutable child NOT the one that was already cloned. This information is used to avoid stealing the original child since we already made a copy of it. It is IIdentityAccess.NULL when this information is not available, e.g., when the copyOnWrite action is triggered by a join() and we are cloning the sibling before we redistribute a key to the node/leaf on which the join was invoked.
TODO:
We could perhaps replace this with the conversion of the INodeData:data field to a mutable field since the code which invokes copyOnWrite() no longer needs to operate on a new Node reference. However, I need to verify that nothing else depends on the new Node, e.g., the dirty flag, addr, etc., Can't we just test to see if the child already has this node as its parent reference and then skip it? If so, then that would remove a troublesome parameter from the API.
Method Detail

minKeys

protected final int minKeys()
Return ((branchingFactor + 1) << 1) - 1

Specified by:
minKeys in class AbstractNode<Node>

maxKeys

protected final int maxKeys()
Return branchingFactor - 1, which is the maximum #of keys for a Node.

Specified by:
maxKeys in class AbstractNode<Node>

rangeCheckChildIndex

protected final boolean rangeCheckChildIndex(int index)
Range check a child index.

Parameters:
index - The index of a child in [0:nkeys+1].
Returns:
true
Throws:
IndexOutOfBoundsException - if the index is not in the legal range.

getChildRef

public final Reference<AbstractNode<?>> getChildRef(int index)
Return the Reference for the child. This is part of the internal API.

Parameters:
index - The index of the child.
Returns:
The Reference for that child. This will be null if the child is not buffered. Note that a non- null return MAY still return null from Reference.get().

getDelegate

public final INodeData getDelegate()
Description copied from class: AbstractNode
Return the delegate IAbstractNodeData object.


isLeaf

public final boolean isLeaf()
Always returns false.

Specified by:
isLeaf in interface IAbstractNodeData
Specified by:
isLeaf in class AbstractNode<Node>

isReadOnly

public final boolean isReadOnly()
The result depends on the backing INodeData implementation. The Node will be mutable when it is first created and is made immutable when it is persisted. If there is a mutation operation, the backing INodeData is automatically converted into a mutable instance.

Specified by:
isReadOnly in interface IAbstractNodeData

isCoded

public final boolean isCoded()
Description copied from interface: IAbstractNodeData
true iff this is a coded data structure.

Specified by:
isCoded in interface IAbstractNodeData

data

public final AbstractFixedByteArrayBuffer data()
Description copied from interface: IAbstractNodeData
The coded (aka compressed) data.

Specified by:
data in interface IAbstractNodeData
Specified by:
data in interface IDataRecordAccess

getKeyCount

public final int getKeyCount()
Description copied from interface: IKeysData
Return the #of keys in the node or leaf. A node has nkeys+1 children. A leaf has nkeys keys and values. The maximum #of keys for a node is one less than the branching factor of the B+Tree. The maximum #of keys for a leaf is the branching factor of the B+Tree. For a hash bucket, this is the #of entries in the bucket.

Specified by:
getKeyCount in interface IKeysData
Returns:
The #of defined keys.

getKeys

public final IRaba getKeys()
Description copied from interface: IKeysData
The object used to contain and manage the keys.

Specified by:
getKeys in interface IKeysData

getChildCount

public final int getChildCount()
Description copied from interface: IChildData
The #of children of this node. Either all children will be nodes or all children will be leaves. The #of children of a node MUST be IAbstractNodeData#getKeyCount()+1

Specified by:
getChildCount in interface IChildData
Returns:
The #of children of this node.

getSpannedTupleCount

public final long getSpannedTupleCount()
Description copied from interface: ISpannedTupleCountData
The #of tuples spanned by this node. For a leaf, the corresponding value is reported by IKeysData.getKeyCount() or ILeafData.getValueCount().

Specified by:
getSpannedTupleCount in interface ISpannedTupleCountData

getChildAddr

public final long getChildAddr(int index)
Description copied from interface: IChildData
Return the persistent addresses of the specified child node.

Specified by:
getChildAddr in interface IChildData
Parameters:
index - The index of the child in [0:nkeys].
Returns:
The persistent child address -or- zero(0L) if the child is not persistent.

getChildEntryCount

public final long getChildEntryCount(int index)
Description copied from interface: ISpannedTupleCountData
Return the #of tuples spanned by the indicated child of this node. The sum of the values returned by this method across the children of the node should always equal the value returned by ISpannedTupleCountData.getSpannedTupleCount() . These data are used to support fast computation of the index at which a key occurs and the #of entries in a given key range.

Specified by:
getChildEntryCount in interface ISpannedTupleCountData
Parameters:
index - The index of the child in [0:nkeys].
Returns:
The #of tuples spanned by that child.

getMaximumVersionTimestamp

public final long getMaximumVersionTimestamp()
Description copied from interface: IAbstractNodeData
The most recent tuple revision timestamp associated with any tuple spanned by this node or leaf. If there are NO tuples for the leaf, then this MUST return Long.MIN_VALUE since the initial value of the maximum version timestamp is always the smallest possible long integer.

Specified by:
getMaximumVersionTimestamp in interface IAbstractNodeData

getMinimumVersionTimestamp

public long getMinimumVersionTimestamp()
Description copied from interface: IAbstractNodeData
The earliest tuple revision timestamp associated with any tuple spanned by this node or leaf. If there are NO tuples for the leaf, then this MUST return Long.MAX_VALUE since the initial value of the minimum version timestamp is always the largest possible long integer.

Specified by:
getMinimumVersionTimestamp in interface IAbstractNodeData

hasVersionTimestamps

public final boolean hasVersionTimestamps()
Description copied from interface: IAbstractNodeData
Return true iff the leaves maintain tuple revision timestamps. When true, the minimum and maximum tuple revision timestamp for a node or leaf are available from IAbstractNodeData.getMinimumVersionTimestamp() and IAbstractNodeData.getMaximumVersionTimestamp().

Specified by:
hasVersionTimestamps in interface IAbstractNodeData

updateEntryCount

protected final void updateEntryCount(AbstractNode<?> child,
                                      long delta)
Apply the delta to the per-child count for this node and then recursively ascend up the tree applying the delta to all ancestors of this node. This is invoked solely by the methods that add and remove entries from a leaf as those are the only methods that change the #of entries spanned by a parent node. Methods that split, merge, or redistribute keys have a net zero effect on the #of entries spanned by the parent.

This also updates getMinimumVersionTimestamp() and getMaximumVersionTimestamp() iff version timestamps are enabled and the child has a minimum (maximum) version timestamp GE the minimum (maximum) version timestamp of this node.

Parameters:
child - The direct child.
delta - The change in the #of spanned children.

updateMinMaxVersionTimestamp

protected final void updateMinMaxVersionTimestamp(AbstractNode<?> child)
Update the getMinimumVersionTimestamp() and getMaximumVersionTimestamp(). This is invoked when the min/max in the child has changed without a corresponding change to the #of spanned tuples. E.g., when an insert() causes a tuple to be updated rather than added.

Parameters:
child - The direct child.

delete

public void delete()
Description copied from interface: IIdentityAccess
Deletes the persistence capable object. Both transient and persistent objects may be logically deleted. If the object is persistent then its space on the store is deallocated.

Specified by:
delete in interface IIdentityAccess
Overrides:
delete in class AbstractNode<Node>

insert

public Tuple insert(byte[] key,
                    byte[] value,
                    boolean delete,
                    long timestamp,
                    Tuple tuple)
Description copied from class: AbstractNode
Insert or update a value.

Specified by:
insert in class AbstractNode<Node>
Parameters:
key - The key (non-null).
value - The value (may be null).
delete - true iff the entry is to marked as deleted (delete markers must be supported for if this is true).
timestamp - The timestamp associated with the version (the value is ignored unless version metadata is being maintained).
tuple - A tuple that may be used to obtain the data and metadata for the pre-existing index entry overwritten by the insert operation (optional).
Returns:
The tuple iff there was a pre-existing entry under that key and null otherwise.

lookup

public Tuple lookup(byte[] key,
                    Tuple tuple)
Description copied from class: AbstractNode
Lookup a key.

Specified by:
lookup in class AbstractNode<Node>
Parameters:
key - The search key.
tuple - A tuple that may be used to obtain the data and metadata for the pre-existing index entry (required).
Returns:
The tuple iff there was a pre-existing entry under that key and null otherwise.

remove

public Tuple remove(byte[] key,
                    Tuple tuple)
Description copied from class: AbstractNode
Recursive search locates the appropriate leaf and removes the entry for the key.

Note: It is an error to call this method if delete markers are in use.

Specified by:
remove in class AbstractNode<Node>
Parameters:
key - The search key.
tuple - A tuple that may be used to obtain the data and metadata for the pre-existing index entry that was either removed by the remove operation (optional).
Returns:
The tuple iff there was a pre-existing entry under that key and null otherwise.

indexOf

public long indexOf(byte[] key)
Description copied from class: AbstractNode
Recursive search locates the appropriate leaf and returns the index position of the entry.

Specified by:
indexOf in class AbstractNode<Node>
Parameters:
key - The search key.
Returns:
the index of the search key, if found; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be found it it were inserted into the btree without intervening mutations. Note that this guarantees that the return value will be >= 0 if and only if the key is found.

rangeCheckSpannedTupleIndex

protected boolean rangeCheckSpannedTupleIndex(long entryIndex)
Range check an index into the keys of the node.

Parameters:
entryIndex - The key index.
Returns:
true
Throws:
IndexOutOfBoundsException - if the index is LT ZERO (0) -or- GTE the getSpannedTupleCount()

keyAt

public final byte[] keyAt(long entryIndex)
Recursive search for the key at the specified entry index.

Specified by:
keyAt in class AbstractNode<Node>
Parameters:
entryIndex - The index of the entry (relative to the first entry spanned by this node).
Returns:
The key at that entry index.

valueAt

public final void valueAt(long entryIndex,
                          Tuple tuple)
Recursive search for the value at the specified entry index.

Specified by:
valueAt in class AbstractNode<Node>
Parameters:
entryIndex - The index of the entry (relative to the first entry spanned by this node).
tuple - A tuple that may be used to obtain the data and metadata for the pre-existing index entry (required).

findChild

protected final int findChild(byte[] searchKey)
Return the index of the child to be searched.

The interpretation of the key index for a node is as follows. When searching nodes of the tree, we search for the index in keys[] of the first key value greater than or equal (GTE) to the probe key. If the match is equal, then we choose the child at index + 1. Otherwise we choose the child having the same index as the GTE key match. For example,

 keys[]  : [ 5 9 12 ]
 child[] : [ a b  c  d ]
 
A probe with keys up to 4 matches at index zero (0) and we choose the 1st child, a, which is at index zero (0).

A probe whose key is 5 matches at index zero (0) exactly and we choose the child at index + 1, b, which is at index one (1).

A probe with keys in [6:8] matches at index one (1) and we choose the 2nd child, b, which is at index one (1). A probe with 9 also matches at index one (1), but we choose index+1 equals two (2) since this is an exact key match.

A probe with keys in [10:11] matches at index two (2) and we choose the 3rd child, c, which is at index two (2). A probe with 12 also matches at index two (2), but we choose index+1 equals three (3) since this is an exact key match.

A probe with keys greater than 12 exceeds all keys in the node and always matches the last child in that node. In this case, d, which is at index three (3).

Note that we never stop a search on a node, even when there is an exact match on a key. All values are stored in the leaves and we always descend until we reach the leaf in which a value for the key would be stored. A test on the keys of that leaf is then conclusive - either a value is stored in the leaf for that key or it is not stored in the tree.

Parameters:
searchkey - The probe key.
Returns:
The child to be searched next for that key.

split

protected IAbstractNode split()

Split an over-capacity node (a node with maxKeys+1 keys), creating a new rightSibling. The splitIndex is (maxKeys+1)/2 . The key at the splitIndex is the separatorKey. Unlike when we split a Leaf, the separatorKey is lifted into the parent and does not appear in either this node or the rightSibling after the split. All keys and child references from splitIndex+1 (inclusive) are moved to the new rightSibling. The child reference at splitIndex remains in this node.

If this node is the root of the tree (no parent), then a new root Node is created without any keys and is made the parent of this node.

In any case, we then insert( separatorKey, rightSibling ) into the parent node, which may cause the parent node itself to split.

Note: splitting a node causes entry counts for the relocated childen to be relocated to the new rightSibling but it does NOT change the #of entries on the parent.

Specified by:
split in class AbstractNode<Node>
Returns:
The high node (or leaf) created by the split.

redistributeKeys

protected void redistributeKeys(AbstractNode sibling,
                                boolean isRightSibling)
Redistributes a key from the specified sibling into this node in order to bring this node up to the minimum #of keys. This also updates a separator key in the parent for the right most of (this, sibling). When a key is redistributed from a sibling, the key in the sibling is rotated into the parent where it replaces the current separatorKey and that separatorKey is brought down into this node. The child corresponding to the key is simply moved from the sibling into this node (rather than rotating it through the parent).

While redistribution changes the #of entries spanned by the node and the sibling and therefore must update #childEntryCounts on the shared parent, it does not change the #of entries spanned by the parent.

Specified by:
redistributeKeys in class AbstractNode<Node>
Parameters:
sibling - A direct sibling of this node (either the left or right sibling). The sibling MUST be mutable.
isRightSibling - True iff the sibling is the rightSibling of this node.
TODO:
change to redistribute keys until the node and the sibling have an equal number of keys. if the other sibling exists and is also materialized then redistribute keys with it as well? This is the B*-Tree variation - it has strengths and weaknesses. I do not think that it will be a big win here since the expected scenario is heavy writes, good retention of nodes, and de-serialization of nodes from a fully buffered journal (hence, no IOs even when we need to materialize a sibling).

merge

protected void merge(AbstractNode sibling,
                     boolean isRightSibling)
Merge the keys and values from the sibling into this node, delete the sibling from the store and remove the sibling from the parent. This will trigger recursive AbstractNode.join() if the parent node is now deficient. While this changes the #of entries spanned by the current node it does NOT effect the #of entries spanned by the parent.

Specified by:
merge in class AbstractNode<Node>
Parameters:
sibling - A direct sibling of this node (does NOT need to be mutable). The sibling MUST have exactly the minimum #of keys.
isRightSibling - True iff the sibling is the rightSibling of this node.

insertChild

protected void insertChild(byte[] key,
                           AbstractNode child)
Invoked by AbstractNode.split() to insert a key and reference for a child created when another child of this node is split. This method has no effect on the #of entries spanned by the parent. *

Note: This operation is invoked only when a node or leaf is split. As such, it can not cause the min/max tuple revision timestamp on this Node to change since no tuples have been added or removed. However, this method does need to record the min/max for the new rightSibling.

Parameters:
key - The key on which the old node was split.
child - The new node. FIXME set min/max for the new child.

getLeftSibling

protected AbstractNode getLeftSibling(AbstractNode child,
                                      boolean materialize)
Return the left sibling. This is used by implementations of AbstractNode.join() to explore their left sibling.

Parameters:
child - The child (must be dirty).
materialize - When true, the left sibling will be materialized if it exists but is not resident.
Returns:
The left sibling or null if it does not exist -or- if it is not materialized and materialized == false. If the sibling is returned, then is NOT guaranteed to be mutable and the caller MUST invoke copy-on-write before attempting to modify the returned sibling.

getRightSibling

protected AbstractNode getRightSibling(AbstractNode child,
                                       boolean materialize)
Return the right sibling of the specified child of a common parent. This method is invoked on the parent, passing in one child and returning its right sibling under that common parent (if any). This is used by implementations of AbstractNode.join() to explore their right sibling.

Parameters:
child - The child (must be dirty).
materialize - When true, the left sibling will be materialized if it exists but is not resident.
Returns:
The left sibling or null if it does not exist -or- if it is not materialized and materialized == false. If the sibling is returned, then is NOT guaranteed to be mutable and the caller MUST invoke copy-on-write before attempting to modify the returned sibling.

getIndexOf

protected int getIndexOf(AbstractNode child)
Return the index of the child among the direct children of this node.

Parameters:
child - The child.
Returns:
The index in childRefs where that child is found. This may also be used as an index into #childAddr and #childEntryCounts.
Throws:
IllegalArgumentException - iff child is not a child of this node.

removeChild

protected void removeChild(AbstractNode child)
Invoked when a non-root node or leaf has no more keys to detach the child from its parent. If the node becomes deficient, then the node is joined with one of its immediate siblings. If the node is the root of the tree, then the root of the tree is also updated. The child is deleted as a post-condition.

Parameters:
child - The child (does NOT need to be mutable).
TODO:
I am a bit suspicious of this method. it appears to be removing the key and child at the same index rather than the key at index-1 and the child at index. This interacts with how the separator key gets updated (or appears to get updated) when a child is removed. That logic occurs in Leaf.merge(AbstractNode, boolean) and in merge(AbstractNode, boolean). It may be that I can simplify things a bit further by making this adjustment here and in those merge() methods. FIXME This should clear the min/max for the child in this node's data record. This is invoked by merge() on a leaf, then recursively if we need to join nodes. The caller should have already updated the min/max for the leaf's own data record and on this node's data record for that leaf. This method only needs to clear the min/max entry associated with the child that is being removed.

getChild

public final AbstractNode getChild(int index)
Return the child node or leaf at the specified index in this node. If the node is not in memory then it is read from the store.

Note: This implementation DOES NOT cause concurrent threads to block unless they are performing IO for the same child. A Memoizer pattern is used to assign each concurrent thread a FutureTask on which it waits for the result. Once the result is available, there is a small synchronized block during which the concurrent requests for a child will content to update the appropriate element in childRefs.

I believe the contention to update childRefs is unavoidable. If this object was made into an AtomicReferenceArray then we would have difficulty when inserting and removing tuples since the backing array is not visible. An array of AtomicReference objects would not help since it would not ensure "publication" when the element was changed from a null to an AtomicReference, only when AtomicReference.compareAndSet(Object, Object) was used. Thus it could only help if we pre-populated the array with AtomicReference objects, which seems wasteful.

As always, the mutable B+Tree is single threaded so there are not added synchronization costs. Concurrent readers can only arise for read-only BTrees and for IndexSegments.

Parameters:
index - The index of the child to be read from the store (in [0:nkeys]).
Returns:
The child node or leaf and never null.
Throws:
IndexOutOfBoundsException - if index is negative.
IndexOutOfBoundsException - if index is GT the #of keys in the node (there is one more child than keys in a node).

getRightMostChild

protected AbstractNode getRightMostChild(boolean nodesOnly)
Return the right-most child of this node.

Parameters:
nodesOnly - when true the search will halt at the right-most non-leaf. Otherwise it will return the right-most leaf.
Returns:
The right-most child of this node.

postOrderNodeIterator

public Iterator<AbstractNode> postOrderNodeIterator(boolean dirtyNodesOnly,
                                                    boolean nodesOnly)
Iterator visits children, recursively expanding each child with a post-order traversal of its children and finally visits this node itself.

Specified by:
postOrderNodeIterator in class AbstractNode<Node>
Parameters:
dirtyNodesOnly - When true, only dirty nodes and leaves will be visited
nodesOnly - When true, the leaves will not be visited.
Returns:
Iterator visiting AbstractNodes.

postOrderIterator

public Iterator<AbstractNode> postOrderIterator(byte[] fromKey,
                                                byte[] toKey)
Iterator visits children in the specified half-open key range, recursively expanding each child with a post-order traversal of its children and finally visits this node itself.

Specified by:
postOrderIterator in class AbstractNode<Node>
Parameters:
fromKey - The first key that will be visited (inclusive). When null there is no lower bound.
toKey - The first key that will NOT be visited (exclusive). When null there is no upper bound.
Returns:
Iterator visiting AbstractNodes.

prefetchChildLeaves

protected void prefetchChildLeaves(Node node,
                                   byte[] fromKey,
                                   byte[] toKey)
When we visit a node whose children are leaves, schedule memoization of those leaves whose separator key in the node is LT the toKey (non-blocking). If the rightSibling of the node would be visited by the iterator, then prefetch of the rightSibling is also scheduled.

Parameters:
node - A node whose children are leaves.
toKey - The exclusive upper bound of some iterator.
TODO:
All memoization should go through the same thread pool in order to bound the #of threads actually reading on the disk. Right now, the memoizer runs in the caller's thread. If we modified it to submit a task to the readService to handle the memoization then we would bound the #of threads involved., PREFETCH : JSR 166 Fork/join would also be a good choice here., PREFETCH : Prefetch will not break if the iterator is closed. This is not a bug per se, but it might be something we want to support in the future., PREFETCH : Prefetch should be cancelled if the btree is closed. We could do this with an Executor wrapping the LatchedExecutor that kept track of the work queue for that executor and allowed us to cancel just the tasks submitted against it. This would also give us a means to report on the prefetch work queue length., PREFETCH : Only journal is supported right now., PREFETCH : The IRangeQuery.CURSOR mode is not supported yet.

prefetchRightSibling

protected void prefetchRightSibling(Node node,
                                    byte[] toKey)
If the caller's toKey is GT the separator keys for the children of this node then the iterator will need to visit the rightSibling of the node and this method will schedule the memoization of the node's rightSibling in order to reduce the IO latency when the iterator visit that rightSibling (non-blocking).

Parameters:
node - A node.
toKey - The exclusive upper bound of some iterator.

childIterator

public Iterator<AbstractNode> childIterator(boolean dirtyNodesOnly)
Iterator visits the direct child nodes in the external key ordering.

Parameters:
dirtyNodesOnly - When true, only the direct dirty child nodes will be visited.

childIterator

public Iterator<AbstractNode> childIterator(byte[] fromKey,
                                            byte[] toKey)
Iterator visits the direct child nodes in the external key ordering.


dump

public boolean dump(org.apache.log4j.Level level,
                    PrintStream out,
                    int height,
                    boolean recursive)
Description copied from class: AbstractNode
Dump the data onto the PrintStream.

Specified by:
dump in class AbstractNode<Node>
Parameters:
level - The logging level.
out - Where to write the dump.
height - The height of this node in the tree or -1 iff you need to invoke this method on a node or leaf whose height in the tree is not known.
recursive - When true, the node will be dumped recursively using a pre-order traversal.
Returns:
True unless an inconsistency was detected.

toString

public String toString()
Human readable representation of the Node.

Overrides:
toString in class PO


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.