com.bigdata.btree
Class BTree

java.lang.Object
  extended by com.bigdata.btree.AbstractBTree
      extended by com.bigdata.btree.BTree
All Implemented Interfaces:
IAutoboxBTree, IBTreeStatistics, ICheckpointProtocol, IIndex, IIndexLocalCounter, ILinearList, ILocalBTreeView, IRangeQuery, ISimpleBTree, ICounterSetAccess, ICommitter
Direct Known Subclasses:
CommitRecordIndex, CommitTimeIndex, CounterSetBTree, EventReceiver.EventBTree, IndexSegmentIndex, JournalIndex, MetadataIndex, Name2Addr, TxId2CommitTimeIndex

public class BTree
extends AbstractBTree
implements ICommitter, ICheckpointProtocol

This class implements a variant of a B+Tree in which all values are stored in leaves, but the leaves are not connected with prior-next links. This constraint arises from the requirement to support a copy-on-write policy.

Note: No mechanism is exposed for fetching a node or leaf of the tree other than the root by its key. This is because the parent reference on the node (or leaf) can only be set when it is read from the store in context by its parent node.

Note: the leaves can not be stitched together with prior and next references without forming cycles that make it impossible to write out the leaves of the btree. This restriction arises because each time we write out a node or leaf it is assigned a persistent identifier as an unavoidable artifact of providing isolation for the object index.

Note: This implementation is thread-safe for concurrent readers BUT NOT for concurrent writers. If a writer has access to a BTree then there MUST NOT be any other reader -or- writer operating on the BTree at the same time.

Version:
$Id: BTree.java 6372 2012-06-29 20:39:41Z thompsonbry $
Author:
Bryan Thompson
TODO:
consider also tracking the #of deleted entries in a key range (parallel to how we track the #of entries in a key range) so that we can report the exact #of non-deleted entries when the btree supports isolation (right now you have to do a key range scan and count the #of non-deleted entries)., Choose the split point in the branch nodes within a parameterized region such that we choose the shortest separator keys to promote to the parent. This is a bit more tricky for the IndexSegmentBuilder., reuse a buffer when copying keys and values out of the index, e.g., for the ITupleIterator, in order to minimize heap churn., The B+Tree implementation does not support limits on the serialized size of a node or leaf. The design strategy is to allow flexible sizes for serialized node and leaf records since larger branching factors and continuous IO are cheaper and nothing in particular requires the use of fixed size records when accessing disk. However, it would still be useful to be able to place an upper bound on the serialized size of a node or leaf. Nodes and leaves are only serialized on eviction, so this would require the ability to split nodes and/or leaves immediately before eviction if they would be likely to overflow the upper bound on the record size. Probably the node or leaf needs to be made immutable, the serialized size checked, and then a split operation invoked if the record would exceed the upper bound. If this happens a lot, then the branching factor is too high for the data and would have to be lowered to regain performance.

Another use case for limits on the size of a node/leaf is maintaining the data in a serialized form. This can have several benefits, including reducing heap churn by copying keys and values (not their references) into the record structure, and reducing (eliminating) deserialization costs (which are substantially more expensive than serialization). This can be approached using new implementations of the key buffer and a value buffer implementation (one that can be used for nodes or for leaves). If an object implements both then it is a small additional step towards using it for the entire serialized record (minus some header/tailer).

Adaptive packed memory arrays can be used in this context to minimize the amount of copying that needs to be performed on mutation of a node or leaf.

Leading key compression is intrinsically part of how the key buffer maintains its representation so, for example, a micro-index would have to be maintained as part of the key buffer., we could defer splits by redistributing keys to left/right siblings that are under capacity - this makes the tree a b*-tree. however, this is not critical since the journal is designed to be fully buffered and the index segments are read-only but it would reduce memory by reducing the #of nodes -- and we can expect that the siblings will be either resident or in the direct buffer for the journal, evict subranges by touching the node on the way up so that a node that is evicted from the hard reference cache will span a subrange that can be evicted together. this will help to preserve locality on disk. with this approach leaves might be evicted independently, but also as part of a node subrange eviction., since forward scans are much more common, change the post-order iterator for commit processing to use a reverse traversal so that we can write the next leaf field whenever we evict a sequence of leaves as part of a sub-range commit (probably not much of an issue since the journal is normally fully buffered and the perfect index segments will not have this problem)., pre-fetch leaves for range scans? if we define a parallel iterator flag?, Note that efficient support for large branching factors requires a more sophisticated approach to maintaining the key order within a node or leaf. E.g., using a red-black tree or adaptive packed memory array. However, we can use smaller branching factors for btrees in the journal and use a separate implementation for bulk generating and reading "perfect" read-only key range segments., derive a string index that uses patricia trees in the leaves per several published papers.


Nested Class Summary
static class BTree.Counter
          Mutable counter.
 class BTree.LeafCursor
          A cursor that may be used to traversal Leafs.
protected static class BTree.NodeFactory
          Factory for mutable nodes and leaves used by the NodeSerializer.
static class BTree.PartitionedCounter
          Places the counter values into a namespace formed by the partition identifier.
protected static class BTree.Stack
          A simple stack based on an array used to maintain hard references for the parent Nodes in the BTree.LeafCursor.
 
Nested classes/interfaces inherited from class com.bigdata.btree.AbstractBTree
AbstractBTree.IBTreeCounters
 
Field Summary
protected  AtomicLong counter
          The mutable counter exposed by #getCounter()}.
protected  int height
          The height of the btree.
protected  long nentries
          The #of entries in the btree.
protected  long nleaves
          The #of leaf nodes in the btree.
protected  long nnodes
          The #of non-leaf nodes in the btree.
protected  long recordVersion
          The value of the record version number that will be assigned to the next node or leaf written onto the backing store.
 
Fields inherited from class com.bigdata.btree.AbstractBTree
branchingFactor, debug, DEBUG, dumpLog, ERROR_CLOSED, ERROR_LESS_THAN_ZERO, ERROR_READ_ONLY, ERROR_TOO_LARGE, ERROR_TRANSIENT, INFO, log, metadata, ndistinctOnWriteRetentionQueue, nodeSer, readOnly, root, store, storeCache, writeRetentionQueue
 
Fields inherited from interface com.bigdata.btree.IRangeQuery
ALL, CURSOR, DEFAULT, DELETED, FIXED_LENGTH_SUCCESSOR, KEYS, NONE, PARALLEL, READONLY, REMOVEALL, REVERSE, VALS
 
Constructor Summary
BTree(IRawStore store, Checkpoint checkpoint, IndexMetadata metadata, boolean readOnly)
          Required constructor form for BTree and any derived subclasses.
 
Method Summary
protected  void _reopen()
          This method is responsible for setting up the root leaf (either new or read from the store), the bloom filter, etc.
 BTree asReadOnly()
          Returns an immutable view of this BTree.
static BTree create(IRawStore store, IndexMetadata metadata)
          Create a new BTree or derived class.
static BTree createTransient(IndexMetadata metadata)
          Create a new BTree or derived class that is fully transient (NO backing IRawStore).
 long createViewCheckpoint()
          Create a new checkpoint for a mutable BTree in which the view is redefined to include the previous view of the BTree (the one from which this BTree instance was loaded) plus the current view of the BTree.
protected  void fireDirtyEvent()
          Fire an event to the listener (iff set).
 BloomFilter getBloomFilter()
          Lazily reads the bloom filter from the backing store if it exists and is not already in memory.
 Checkpoint getCheckpoint()
          Returns the most recent Checkpoint record.
 ICounter getCounter()
          Returns an ICounter.
 IDirtyListener getDirtyListener()
          Return the IDirtyListener.
 long getEntryCount()
          The #of entries (aka tuples) in the AbstractBTree.
 int getHeight()
          The height of the btree.
 long getLastCommitTime()
          The timestamp associated with the last IAtomicStore.commit() in which writes buffered by this index were made restart-safe on the backing store.
 long getLeafCount()
          The #of leaf nodes in the AbstractBTree.
 long getMetadataAddr()
          The address at which the most recent IndexMetadata record was written.
 BTree getMutableBTree()
          The BTree that is absorbing writes for the view.
 long getNodeCount()
          The #of non-leaf nodes in the AbstractBTree.
 long getRecordVersion()
          The value of the record version number that will be assigned to the next node or leaf written onto the backing store.
 long getRevisionTimestamp()
          The timestamp associated with unisolated writes on this index.
 long getRootAddr()
          The address of the last written root of the persistent data structure -or- 0L if there is no root.
 int getSourceCount()
          Returns ONE (1).
 AbstractBTree[] getSources()
          An array containing this BTree.
 IRawStore getStore()
          The backing store.
 long handleCommit(long commitTime)
          Handle request for a commit by writeCheckpoint()ing dirty nodes and leaves onto the store, writing a new metadata record, and returning the address of that metadata record.
static BTree load(IRawStore store, long addrCheckpoint, boolean readOnly)
          Load an instance of a BTree or derived class from the store.
 boolean needsCheckpoint()
          Return true iff changes would be lost unless the B+Tree is flushed to the backing store using writeCheckpoint().
 BTree.LeafCursor newLeafCursor(byte[] key)
          Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree.
 BTree.LeafCursor newLeafCursor(SeekEnum where)
          Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree.
protected  BloomFilter readBloomFilter()
          Read the bloom filter from the backing store using the address stored in the last checkpoint record.
 void removeAll()
          Remove all entries in the B+Tree.
 ICloseableIterator<?> scan()
          Visit all entries in the index in the natural order of the index.
 void setDirtyListener(IDirtyListener listener)
          Set or clear the listener (there can be only one).
 void setIndexMetadata(IndexMetadata indexMetadata)
          Method updates the index metadata associated with this BTree.
 void setLastCommitTime(long lastCommitTime)
          Sets the lastCommitTime.
 long writeCheckpoint()
          Checkpoint operation must #flush() dirty nodes, dirty persistent data structures, etc, write a new Checkpoint record on the backing store, save a reference to the current Checkpoint, and return the address of that Checkpoint record.
 Checkpoint writeCheckpoint2()
          Checkpoint operation must #flush() dirty nodes, dirty persistent data structures, etc, write a new Checkpoint record on the backing store, save a reference to the current Checkpoint, and return the address of that Checkpoint record.
 
Methods inherited from class com.bigdata.btree.AbstractBTree
assertNotReadOnly, assertNotTransient, close, contains, contains, decodeRecordAddr, dump, dump, encodeRecordAddr, getBranchingFactor, getBtreeCounters, getContainsTuple, getCounters, getIndexMetadata, getLookupTuple, getNodeSerializer, getResourceMetadata, getRightMostNode, getRoot, getRootOrFinger, getStatistics, getUtilization, getWriteTuple, indexOf, insert, insert, insert, isOpen, isReadOnly, isTransient, keyAt, lookup, lookup, lookup, rangeCheck, rangeCopy, rangeCount, rangeCount, rangeCount, rangeCountExact, rangeCountExactWithDeleted, rangeIterator, rangeIterator, rangeIterator, rangeIterator, rangeIterator, readNodeOrLeaf, recycle, remove, remove, remove, reopen, setBTreeCounters, submit, submit, submit, toString, touch, valueAt, valueAt, writeNodeOrLeaf, writeNodeRecursive
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface com.bigdata.btree.ICheckpointProtocol
close, getIndexMetadata, isOpen, rangeCount, reopen
 
Methods inherited from interface com.bigdata.counters.ICounterSetAccess
getCounters
 

Field Detail

height

protected int height
The height of the btree. The height is the #of leaves minus one. A btree with only a root leaf is said to have height := 0. Note that the height only changes when we split the root node.


nnodes

protected long nnodes
The #of non-leaf nodes in the btree. The is zero (0) for a new btree.


nleaves

protected long nleaves
The #of leaf nodes in the btree. This is one (1) for a new btree.


nentries

protected long nentries
The #of entries in the btree. This is zero (0) for a new btree.


recordVersion

protected long recordVersion
The value of the record version number that will be assigned to the next node or leaf written onto the backing store. This number is incremented each time a node or leaf is written onto the backing store. The initial value is ZERO (0). The first value assigned to a node or leaf will be ZERO (0).


counter

protected AtomicLong counter
The mutable counter exposed by #getCounter()}.

Note: This is protected so that it will be visible to Checkpoint which needs to write the actual value store in this counter into its serialized record (without the partition identifier).

Constructor Detail

BTree

public BTree(IRawStore store,
             Checkpoint checkpoint,
             IndexMetadata metadata,
             boolean readOnly)
Required constructor form for BTree and any derived subclasses. This constructor is used both to create a new BTree, and to load a BTree from the store using a Checkpoint record.

Parameters:
store - The store.
checkpoint - A Checkpoint record for that BTree.
metadata - The metadata record for that BTree.
readOnly - When true the BTree will be immutable.
See Also:
create(IRawStore, IndexMetadata), load(IRawStore, long, boolean)
Method Detail

getHeight

public final int getHeight()
Description copied from interface: IBTreeStatistics
The height of the btree. The height is the #of levels minus one. A btree with only a root leaf has height := 0. A btree with a root node and one level of leaves under it has height := 1. Note that all leaves of a btree are at the same height (this is what is means for the btree to be "balanced"). Also note that the height only changes when we split or join the root node (a btree maintains balance by growing and shrinking in levels from the top rather than the leaves).

Specified by:
getHeight in interface IBTreeStatistics
Specified by:
getHeight in class AbstractBTree

getNodeCount

public final long getNodeCount()
Description copied from interface: IBTreeStatistics
The #of non-leaf nodes in the AbstractBTree. This is zero (0) for a new btree.

Specified by:
getNodeCount in interface IBTreeStatistics
Specified by:
getNodeCount in class AbstractBTree

getLeafCount

public final long getLeafCount()
Description copied from interface: IBTreeStatistics
The #of leaf nodes in the AbstractBTree. This is one (1) for a new btree.

Specified by:
getLeafCount in interface IBTreeStatistics
Specified by:
getLeafCount in class AbstractBTree

getEntryCount

public final long getEntryCount()
Description copied from interface: IBTreeStatistics
The #of entries (aka tuples) in the AbstractBTree. This is zero (0) for a new B+Tree. When the B+Tree supports delete markers, this value also includes tuples which have been marked as deleted.

Specified by:
getEntryCount in interface IBTreeStatistics
Specified by:
getEntryCount in class AbstractBTree

getStore

public final IRawStore getStore()
The backing store.

Specified by:
getStore in class AbstractBTree

getCounter

public ICounter getCounter()
Returns an ICounter. The ICounter is mutable iff the BTree is mutable. All ICounters returned by this method report and increment the same underlying counter.

Note: When the BTree is part of a scale-out index then the counter will assign values within a namespace defined by the partition identifier.

Specified by:
getCounter in interface IIndexLocalCounter

_reopen

protected void _reopen()
Description copied from class: AbstractBTree
This method is responsible for setting up the root leaf (either new or read from the store), the bloom filter, etc. It is invoked by AbstractBTree.reopen() once AbstractBTree.root has been show to be null with double-checked locking. When invoked in this context, the caller is guaranteed to hold a lock on this. This is done to ensure that at most one thread gets to re-open the index from the backing store.

Specified by:
_reopen in class AbstractBTree

getBloomFilter

public final BloomFilter getBloomFilter()
Lazily reads the bloom filter from the backing store if it exists and is not already in memory.

Specified by:
getBloomFilter in interface ILocalBTreeView
Specified by:
getBloomFilter in class AbstractBTree
Returns:
The bloom filter -or- null if there is no bloom filter (including the case where there is a bloom filter but it has been disabled since the BTree has grown too large and the expected error rate of the bloom filter would be too high).

readBloomFilter

protected final BloomFilter readBloomFilter()
Read the bloom filter from the backing store using the address stored in the last checkpoint record. This method will be invoked by getBloomFilter() when the bloom filter reference is null but the bloom filter is known to exist and the bloom filter object is requested.

Note: A bloom filter can be relatively large. The bit length of a bloom filter is approximately one byte per index entry, so a filter for an index with 10M index entries will be on the order of 10mb. Therefore this method will typically have high latency.

Note: the Checkpoint record initially stores 0L for the bloom filter address. newRootLeaf() is responsible for allocating the bloom filter (if one is to be used) when the root leaf is (re-)created. The address then gets stored in the Checkpoint record by writeCheckpoint() (if invoked and once the bloom filter is dirty).

Throws:
IllegalStateException - if the bloom filter does not exist (the caller should check this first to avoid obtaining a lock).

getLastCommitTime

public final long getLastCommitTime()
Description copied from class: AbstractBTree
The timestamp associated with the last IAtomicStore.commit() in which writes buffered by this index were made restart-safe on the backing store. The lastCommitTime is set when the index is loaded from the backing store and updated after each commit. It is ZERO (0L) when BTree is first created and will remain ZERO (0L) until the BTree is committed. If the backing store does not support atomic commits, then this value will always be ZERO (0L).

Note: This is fixed for an IndexSegment.

Specified by:
getLastCommitTime in class AbstractBTree

getRevisionTimestamp

public final long getRevisionTimestamp()
Description copied from class: AbstractBTree
The timestamp associated with unisolated writes on this index. This timestamp is designed to allow the interleaving of full transactions (whose revision timestamp is assigned by the transaction service) with unisolated operations on the same indices.

The revision timestamp assigned by this method is lastCommitTime+1. The reasoning is as follows. Revision timestamps are assigned by the transaction manager when the transaction is validated as part of its commit protocol. Therefore, revision timestamps are assigned after the transaction write set is complete. Further, the assigned revisionTimestamp will be strictly LT the commitTime for that transaction. By using lastCommitTime+1 we are guaranteed that the revisionTimestamp for new writes (which will be part of some future commit point) will always be strictly GT the revisionTimestamp of historical writes (which were part of some prior commit point).

Note: Unisolated operations using this timestamp ARE NOT validated. The timestamp is simply applied to the tuple when it is inserted or updated and will become part of the restart safe state of the B+Tree once the unisolated operation participates in a commit.

Note: If an unisolated operation were to execute concurrent with a transaction commit for the same index then that could produce inconsistent results in the index and could trigger concurrent modification errors. In order to avoid such concurrent modification errors, unisolated operations which are to be mixed with full transactions MUST ensure that they have exclusive access to the unisolated index before proceeding. There are two ways to do this: (1) take the application off line for transactions; (2) submit your unisolated operations to the IConcurrencyManager which will automatically impose the necessary constraints on concurrent access to the unisolated indices.

Specified by:
getRevisionTimestamp in class AbstractBTree
Returns:
The revision timestamp to be assigned to an unisolated write.

setLastCommitTime

public final void setLastCommitTime(long lastCommitTime)
Description copied from interface: ICheckpointProtocol
Sets the lastCommitTime.

Note: The lastCommitTime is set by a combination of the AbstractJournal and Name2Addr based on the actual commitTime of the commit during which an Name2Addr.Entry for that index was last committed. It is set for both historical index reads and unisolated index reads using Name2Addr.Entry.commitTime. The lastCommitTime for an unisolated index will advance as commits are performed with that index.

Specified by:
setLastCommitTime in interface ICheckpointProtocol
Parameters:
lastCommitTime - The timestamp of the last committed state of this index.

getDirtyListener

public final IDirtyListener getDirtyListener()
Return the IDirtyListener.

Specified by:
getDirtyListener in interface ICheckpointProtocol

setDirtyListener

public final void setDirtyListener(IDirtyListener listener)
Set or clear the listener (there can be only one).

Specified by:
setDirtyListener in interface ICheckpointProtocol
Parameters:
listener - The listener.

fireDirtyEvent

protected final void fireDirtyEvent()
Fire an event to the listener (iff set).


asReadOnly

public BTree asReadOnly()
Returns an immutable view of this BTree. If BTree is already read-only, then this instance is returned. Otherwise, a read-only BTree is loaded from the last checkpoint and returned.

Throws:
IllegalStateException - If the BTree is dirty.
TODO:
The Checkpoint could hold a WeakReference singleton to the read-only view loaded from that checkpoint.

writeCheckpoint

public final long writeCheckpoint()
Checkpoint operation must #flush() dirty nodes, dirty persistent data structures, etc, write a new Checkpoint record on the backing store, save a reference to the current Checkpoint, and return the address of that Checkpoint record.

Note: A checkpoint by itself is NOT an atomic commit. The commit protocol is at the store level and uses Checkpoints to ensure that the state of the persistence capable data structure is current on the backing store.

This checkpoint operation flush()es dirty nodes, the optional IBloomFilter (if dirty), the IndexMetadata (if dirty), and then writes a new Checkpoint record on the backing store, saves a reference to the current Checkpoint and returns the address of that Checkpoint record.

Specified by:
writeCheckpoint in interface ICheckpointProtocol
Returns:
The address at which the Checkpoint record for the persistence capable was written onto the store. The data structure can be reloaded from this Checkpoint record.
See Also:
#writeCheckpoint2(), which returns the {@link Checkpoint} record itself., load(IRawStore, long, boolean)

writeCheckpoint2

public final Checkpoint writeCheckpoint2()
Checkpoint operation must #flush() dirty nodes, dirty persistent data structures, etc, write a new Checkpoint record on the backing store, save a reference to the current Checkpoint, and return the address of that Checkpoint record.

Note: A checkpoint by itself is NOT an atomic commit. The commit protocol is at the store level and uses Checkpoints to ensure that the state of the persistence capable data structure is current on the backing store.

Specified by:
writeCheckpoint2 in interface ICheckpointProtocol
Returns:
The Checkpoint record for the persistent data structure which was written onto the store. The persistent data structure can be reloaded from this Checkpoint record.
See Also:
load(IRawStore, long, boolean)

getCheckpoint

public final Checkpoint getCheckpoint()
Description copied from interface: ICheckpointProtocol
Returns the most recent Checkpoint record.

Specified by:
getCheckpoint in interface ICheckpointProtocol
Returns:
The most recent Checkpoint record and never null.

getRecordVersion

public final long getRecordVersion()
Description copied from interface: ICheckpointProtocol
The value of the record version number that will be assigned to the next node or leaf written onto the backing store. This number is incremented each time a node or leaf is written onto the backing store. The initial value is ZERO (0). The first value assigned to a node or leaf will be ZERO (0). TODO Nobody is actually incrementing this value right now.

Specified by:
getRecordVersion in interface ICheckpointProtocol

getMetadataAddr

public final long getMetadataAddr()
Description copied from interface: ICheckpointProtocol
The address at which the most recent IndexMetadata record was written.

Specified by:
getMetadataAddr in interface ICheckpointProtocol

getRootAddr

public final long getRootAddr()
Description copied from interface: ICheckpointProtocol
The address of the last written root of the persistent data structure -or- 0L if there is no root. A 0L return may be an indication that an empty data structure will be created on demand.

Specified by:
getRootAddr in interface ICheckpointProtocol

needsCheckpoint

public boolean needsCheckpoint()
Return true iff changes would be lost unless the B+Tree is flushed to the backing store using writeCheckpoint().

Note: In order to avoid needless checkpoints this method will return false if:

Returns:
true true iff changes would be lost unless the B+Tree was flushed to the backing store using writeCheckpoint().

setIndexMetadata

public final void setIndexMetadata(IndexMetadata indexMetadata)
Method updates the index metadata associated with this BTree. The new metadata record will be written out as part of the next index writeCheckpoint().

Note: this method should be used with caution.

Parameters:
indexMetadata - The new metadata description for the BTree.
Throws:
IllegalArgumentException - if the new value is null
IllegalArgumentException - if the new IndexMetadata record has already been written on the store - see IndexMetadata.getMetadataAddr()
UnsupportedOperationException - if the index is read-only.

handleCommit

public long handleCommit(long commitTime)
Handle request for a commit by writeCheckpoint()ing dirty nodes and leaves onto the store, writing a new metadata record, and returning the address of that metadata record.

Note: In order to avoid needless writes the existing metadata record is always returned if needsCheckpoint() is false.

Note: The address of the existing Checkpoint record is always returned if autoCommit is disabled.

Specified by:
handleCommit in interface ICommitter
Parameters:
commitTime - The timestamp assigned to the commit.
Returns:
The address of a Checkpoint record from which the btree may be reloaded.

scan

public final ICloseableIterator<?> scan()
Description copied from interface: ICheckpointProtocol
Visit all entries in the index in the natural order of the index.

Specified by:
scan in interface ICheckpointProtocol

removeAll

public final void removeAll()
Remove all entries in the B+Tree.

When delete markers are not enabled this simply replaces the root with a new root leaf and resets the counters (height, #of nodes, #of leaves, etc). to be consistent with that new root. If the btree is then made restart-safe by the commit protocol of the backing store then the effect is as if all entries had been deleted. Old nodes and leaves will be swept from the store eventually when the journal overflows.

When delete markers are enabled all un-deleted entries in the index are overwritten with a delete marker.

Note: The IIndexManager defines methods for registering (adding) and dropping indices vs removing the entries in an individual BTree.

Specified by:
removeAll in interface ICheckpointProtocol
Specified by:
removeAll in class AbstractBTree

createViewCheckpoint

public long createViewCheckpoint()
Create a new checkpoint for a mutable BTree in which the view is redefined to include the previous view of the BTree (the one from which this BTree instance was loaded) plus the current view of the BTree. The root of the BTree is replaced with an empty leaf as part of this operation. The LocalPartitionMetadata is updated to reflect the new view.

Note: This method is used by the scale-out architecture for which it performs a very specific function. It should not be used for any other purpose and should not be invoked by user code. This encapsulates all of the trickery for creating the necessary checkpoint without exposing any methods which could be used to replace the root node with an empty root leaf.

Returns:
The timestamp associated with the checkpoint record from which this BTree was loaded. This is the timestamp that was placed on the 2nd element of the view. Any read-only build or merge task will read from this timestamp.
Throws:
IllegalStateException - if the BTree is read only.
IllegalStateException - if the BTree is dirty.
IllegalStateException - if the BTree has never been committed.

create

public static BTree create(IRawStore store,
                           IndexMetadata metadata)
Create a new BTree or derived class. This method works by writing the IndexMetadata record on the store and then loading the BTree from the IndexMetadata record.

Parameters:
store - The store.
metadata - The metadata record.
Returns:
The newly created BTree.
Throws:
IllegalStateException - If you attempt to create two btree objects from the same metadata record since the metadata address will have already been noted on the IndexMetadata object. You can use IndexMetadata.clone() to obtain a new copy of the metadata object with the metadata address set to 0L.
IllegalStateException - if the IndexTypeEnum in the supplied IndexMetadata object is not IndexTypeEnum.BTree.
See Also:
load(IRawStore, long, boolean)

createTransient

public static BTree createTransient(IndexMetadata metadata)
Create a new BTree or derived class that is fully transient (NO backing IRawStore).

Fully transient BTrees provide the functionality of a B+Tree without a backing persistence store. Internally, reachable nodes and leaves of the transient BTree use hard references to ensure that remain strongly reachable. Deleted nodes and leaves simply clear their references and will be swept by the garbage collector shortly thereafter.

Operations which attempt to write on the backing store will fail.

While nodes and leaves are never persisted, the keys and values of the transient BTree are unsigned byte[]s. This means that application keys and values are always converted into unsigned byte[]s before being stored in the BTree. Hence if an object that is inserted into the BTree and then looked up using the BTree API, you WILL NOT get back the same object reference.

Note: CLOSING A TRANSIENT INDEX WILL DISCARD ALL DATA!

Parameters:
metadata - The metadata record.
Returns:
The transient BTree.

load

public static BTree load(IRawStore store,
                         long addrCheckpoint,
                         boolean readOnly)
Load an instance of a BTree or derived class from the store. The BTree or derived class MUST declare a constructor with the following signature: className(IRawStore store, Checkpoint checkpoint, IndexMetadata metadata, boolean readOnly)

Parameters:
store - The store.
addrCheckpoint - The address of a Checkpoint record for the index.
readOnly - When true the BTree will be marked as read-only. Marking has some advantages relating to the locking scheme used by Node.getChild(int) since the root node is known to be read-only at the time that it is allocated as per-child locking is therefore in place for all nodes in the read-only BTree. It also results in much higher concurrency for AbstractBTree.touch(AbstractNode).
Returns:
The BTree or derived class loaded from that Checkpoint record.
Throws:
IllegalArgumentException - if store is null.

getSourceCount

public final int getSourceCount()
Returns ONE (1).

Specified by:
getSourceCount in interface ILocalBTreeView

getSources

public final AbstractBTree[] getSources()
An array containing this BTree.

Specified by:
getSources in interface ILocalBTreeView

getMutableBTree

public final BTree getMutableBTree()
Description copied from interface: ILocalBTreeView
The BTree that is absorbing writes for the view.

Specified by:
getMutableBTree in interface ILocalBTreeView

newLeafCursor

public BTree.LeafCursor newLeafCursor(SeekEnum where)
Description copied from class: AbstractBTree
Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree. The cursor will be initially positioned on the leaf identified by the symbolic constant.

Specified by:
newLeafCursor in class AbstractBTree

newLeafCursor

public BTree.LeafCursor newLeafCursor(byte[] key)
Description copied from class: AbstractBTree
Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree. The cursor will be initially positioned on the leaf that spans the given key.

Specified by:
newLeafCursor in class AbstractBTree
Parameters:
key - The key (required).


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.