|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.btree.AbstractBTree
public abstract class AbstractBTree
Base class for mutable and immutable B+-Tree implementations.
The B+-Tree implementation supports variable length unsigned byte[] keys and
provides a IKeyBuilder utilities designed to make it possible to
generate keys from any combination of primitive data types and Unicode
strings. The total ordering imposed by the index is that of a bit-wise
comparison of the variable length unsigned byte[] keys. Encoding Unicode keys
is support by an integration with ICU4J and applications may choose the
locale, strength, and other properties that govern the sort order of sort
keys generated from Unicode strings. Sort keys produces by different collator
objects are NOT compatible and applications that use Unicode data in their keys
MUST make sure that they use a collator that imposes the same sort order each
time they provision a KeyBuilder.
The use of variable length unsigned byte[] keys makes it possible for the B+-Tree to perform very fast comparison of a search key with keys in the nodes and leaves of the tree. To support fast search, the leading prefix is factored out each time a node or leaf is made immutable, e.g., directly proceeding serialization. Further, the separator keys are chosen to be the shortest separator key in order to further shorten the keys in the nodes of the tree.
The B+Tree can optionally maintain version metadata (a version timestamp and
deletion marker). Version metadata MUST be enabled (a) if the index will be
used with transactional isolation; or (b) if the index is part of a scale-out
index. In both cases the version timestamps and deletion markers play a
critical role when reading from a fused view of an ordered set of indices
describing an index or an index partition. Note that the existence of
deletion markers means that rangeCount(byte[], byte[]) and
getEntryCount() are upper bounds as deleted entries will be reported
until they are purged from the index by a compacting merge of the view.
KeyBuilder| Field Summary | |
|---|---|
protected int |
branchingFactor
The branching factor for the btree. |
protected boolean |
debug
Flag turns on the use of AbstractNode.assertInvariants() and is
automatically enabled when the logger is set to
Level.DEBUG. |
protected static boolean |
DEBUG
True iff the log level is DEBUG or less. |
static org.apache.log4j.Logger |
dumpLog
Log for dump(PrintStream) and friends. |
protected static String |
ERROR_CLOSED
The index is already closed. |
protected static String |
ERROR_LESS_THAN_ZERO
A parameter was less than zero. |
protected static String |
ERROR_READ_ONLY
The index is read-only but a mutation operation was requested. |
protected static String |
ERROR_TOO_LARGE
A parameter was too large. |
protected static String |
ERROR_TRANSIENT
The index is transient (not backed by persistent storage) but an operation that requires persistence was requested. |
protected static boolean |
INFO
True iff the log level is INFO or less. |
protected static org.apache.log4j.Logger |
log
Log for btree opeations. |
protected IndexMetadata |
metadata
The metadata record for the index. |
protected int |
ndistinctOnWriteRetentionQueue
The #of distinct nodes and leaves on the writeRetentionQueue. |
protected NodeSerializer |
nodeSer
Used to serialize and de-serialize the nodes and leaves of the tree. |
protected AbstractNode<?> |
root
The root of the btree. |
protected IRawStore |
store
The persistence store -or- null iff the B+Tree is transient. |
protected IGlobalLRU.ILRUCache<Long,Object> |
storeCache
Optional cache for INodeData and ILeafData instances and
always null if the B+Tree is transient. |
protected HardReferenceQueue<PO> |
writeRetentionQueue
Nodes (that is nodes or leaves) are added to a hard reference queue when they are created or read from the store. |
| Fields inherited from interface com.bigdata.btree.IRangeQuery |
|---|
ALL, CURSOR, DEFAULT, DELETED, FIXED_LENGTH_SUCCESSOR, KEYS, NONE, PARALLEL, READONLY, REMOVEALL, REVERSE, VALS |
| Constructor Summary | |
|---|---|
protected |
AbstractBTree(IRawStore store,
INodeFactory nodeFactory,
boolean readOnly,
IndexMetadata metadata,
IRecordCompressorFactory<?> recordCompressorFactory)
|
| Method Summary | |
|---|---|
protected abstract void |
_reopen()
This method is responsible for setting up the root leaf (either new or read from the store), the bloom filter, etc. |
protected void |
assertNotReadOnly()
|
protected void |
assertNotTransient()
|
void |
close()
The contract for close() is to reduce the resource burden of the
index (by discarding buffers) while not rendering the index inoperative. |
boolean |
contains(byte[] key)
Core method to decide whether the index has a (non-deleted) entry under a key. |
boolean |
contains(Object key)
Return true iff there is an entry for the key. |
boolean |
dump(org.apache.log4j.Level level,
PrintStream out)
|
boolean |
dump(PrintStream out)
Recursive dump of the tree. |
abstract BloomFilter |
getBloomFilter()
Return the optional IBloomFilter, transparently
reopen()ing the index if necessary. |
int |
getBranchingFactor()
The branching factor for the btree. |
BTreeCounters |
getBtreeCounters()
Counters tracking various aspects of the btree. |
Tuple |
getContainsTuple()
Return a Tuple that may be used to test for the existence of
tuples having a given key within the AbstractBTree. |
ICounterSet |
getCounters()
Return some "statistics" about the btree including both the static CounterSet and the BTreeCounterss. |
CounterSet |
getDynamicCounterSet()
Return a new CounterSet containing dynamic counters - DO
NOT hold onto the returned CounterSet as it contains implicit
hard references to the AbstractBTree and will prevent the
AbstractBTree from being finalized! |
abstract int |
getEntryCount()
The #of entries (aka values) in the AbstractBTree. |
abstract int |
getHeight()
The height of the btree. |
IndexMetadata |
getIndexMetadata()
Returns the metadata record for this btree. |
abstract long |
getLastCommitTime()
The timestamp associated with the last IAtomicStore.commit() in
which writes buffered by this index were made restart-safe on the backing
store. |
abstract int |
getLeafCount()
The #of leaf nodes in the AbstractBTree. |
Tuple |
getLookupTuple()
Return a Tuple that may be used to copy the value associated with
a key out of the AbstractBTree. |
abstract int |
getNodeCount()
The #of non-leaf nodes in the AbstractBTree. |
NodeSerializer |
getNodeSerializer()
The object responsible for (de-)serializing the nodes and leaves of the IIndex. |
IResourceMetadata[] |
getResourceMetadata()
The description of the resources comprising the index view. |
AbstractNode |
getRightMostNode(boolean nodesOnly)
Utility method returns the right-most node in the B+Tree. |
protected AbstractNode<?> |
getRoot()
The root of the btree. |
protected AbstractNode<?> |
getRootOrFinger(byte[] key)
Returns the node or leaf to be used for search. |
CounterSet |
getStaticCounterSet()
Returns a new CounterSet containing static
counters. |
abstract IRawStore |
getStore()
The backing store. |
int[] |
getUtilization()
Computes and returns the utilization of the tree. |
Tuple |
getWriteTuple()
Private instance used for mutation operations (insert, remove) which are single threaded. |
int |
indexOf(byte[] key)
Lookup the index position of the key. |
byte[] |
insert(byte[] key,
byte[] value)
Insert or update a value under the key. |
Tuple |
insert(byte[] key,
byte[] value,
boolean delete,
long timestamp,
Tuple tuple)
Core method for inserting or updating a value under a key. |
Object |
insert(Object key,
Object value)
Insert with auto-magic handling of keys and value objects. |
boolean |
isOpen()
An "open" index has its buffers and root node in place rather than having to reallocate buffers or reload the root node from the store. |
abstract boolean |
isReadOnly()
Return true iff this B+Tree is read-only. |
boolean |
isTransient()
Return true iff this is a transient data structure (no
backing store). |
byte[] |
keyAt(int index)
Return the key for the identified entry. |
byte[] |
lookup(byte[] key)
Lookup a value for a key. |
Tuple |
lookup(byte[] key,
Tuple tuple)
Core method for retrieving a value under a key. |
Object |
lookup(Object key)
Lookup a value for a key. |
abstract ILeafCursor |
newLeafCursor(byte[] key)
Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree. |
abstract ILeafCursor |
newLeafCursor(SeekEnum where)
Return a cursor that may be used to efficiently locate and scan the leaves in the B+Tree. |
protected boolean |
rangeCheck(byte[] key,
boolean allowUpperBound)
Iff the B+Tree is an index partition then verify that the key lies within the key range of an index partition. |
long |
rangeCopy(IIndex src,
byte[] fromKey,
byte[] toKey,
boolean overflow)
Copy all data, including deleted index entry markers and timestamps iff supported by the source and target. |
long |
rangeCount()
Return the #of tuples in the index. |
long |
rangeCount(byte[] fromKey,
byte[] toKey)
This method computes the #of entries in the half-open range using AbstractNode#indexOf(Object). |
long |
rangeCount(Object fromKey,
Object toKey)
Variant implicitly converts the optional application keys into unsigned byte[]s. |
long |
rangeCountExact(byte[] fromKey,
byte[] toKey)
Return the exact tuple count for the half-open key range. |
long |
rangeCountExactWithDeleted(byte[] fromKey,
byte[] toKey)
Note: rangeCount(byte[], byte[]) already reports deleted tuples
for an AbstractBTree so this method is just delegated to that
one. |
ITupleIterator |
rangeIterator()
Visits all tuples in key order. |
ITupleIterator |
rangeIterator(byte[] fromKey,
byte[] toKey)
Return an iterator that visits the entries in a half-open key range. |
ITupleIterator |
rangeIterator(byte[] fromKey,
byte[] toKey,
int capacityIsIgnored,
int flags,
IFilterConstructor filter)
Core implementation. |
ITupleIterator |
rangeIterator(Object fromKey,
Object toKey)
Variant implicitly converts the optional application keys into unsigned byte[]s. |
ITupleIterator |
rangeIterator(Object fromKey,
Object toKey,
int capacity,
int flags,
IFilterConstructor filter)
Variant implicitly converts the optional application keys into unsigned byte[]s. |
protected AbstractNode<?> |
readNodeOrLeaf(long addr)
Read a node or leaf from the store. |
byte[] |
remove(byte[] key)
Remove the tuple under that key (will write a delete marker if delete markers are enabled). |
Tuple |
remove(byte[] key,
Tuple tuple)
Core method for deleting a value under a key. |
Object |
remove(Object key)
Remove the key and its associated value. |
abstract void |
removeAll()
Remove all entries in the B+Tree. |
protected void |
reopen()
This is part of a close()/reopen() protocol that may
be used to reduce the resource burden of an AbstractBTree. |
void |
setBTreeCounters(BTreeCounters btreeCounters)
Replace the BTreeCounters. |
void |
submit(byte[] fromKey,
byte[] toKey,
IKeyRangeIndexProcedure proc,
IResultHandler handler)
The procedure will be transparently applied against each index partition spanned by the given key range. |
Object |
submit(byte[] key,
ISimpleIndexProcedure proc)
Submits an index procedure that operations on a single key to the appropriate index partition returning the result of that procedure. |
void |
submit(int fromIndex,
int toIndex,
byte[][] keys,
byte[][] vals,
AbstractKeyArrayIndexProcedureConstructor ctor,
IResultHandler aggregator)
Runs a procedure against an index. |
String |
toString()
Fast summary information about the B+Tree. |
protected void |
touch(AbstractNode<?> node)
Touch the node or leaf on the writeRetentionQueue. |
byte[] |
valueAt(int index)
Return the value for the identified entry. |
Tuple |
valueAt(int index,
Tuple tuple)
|
protected long |
writeNodeOrLeaf(AbstractNode<?> node)
Codes the node and writes the coded record on the store (non-recursive). |
protected void |
writeNodeRecursive(AbstractNode<?> node)
Write a dirty node and its children using a post-order traversal that first writes any dirty leaves and then (recursively) their parent nodes. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface com.bigdata.btree.IIndex |
|---|
getCounter |
| Field Detail |
|---|
protected static final String ERROR_CLOSED
protected static final String ERROR_LESS_THAN_ZERO
protected static final String ERROR_TOO_LARGE
protected static final String ERROR_READ_ONLY
protected static final String ERROR_TRANSIENT
protected static final org.apache.log4j.Logger log
protected static final boolean INFO
log level is INFO or less.
protected static final boolean DEBUG
log level is DEBUG or less.
public static final org.apache.log4j.Logger dumpLog
dump(PrintStream) and friends.
protected final boolean debug
AbstractNode.assertInvariants() and is
automatically enabled when the logger is set to
Level.DEBUG.
protected final IRawStore store
null iff the B+Tree is transient.
protected final IGlobalLRU.ILRUCache<Long,Object> storeCache
INodeData and ILeafData instances and
always null if the B+Tree is transient.
protected final int branchingFactor
protected volatile AbstractNode<?> root
This hard reference is cleared to null if an index is
closed. getRoot() automatically uses
reopen() to reload the root so that closed indices may be
transparently made ready for further use (indices are closed to reduce
their resource burden, not to make their references invalid). The
AbstractNode and derived classes assume that the root
is non-null. This assumption is valid if close() is invoked by
the application in a manner consistent with the single-threaded contract
for the AbstractBTree.
Note: This field MUST be marked as [volatile] in order to guarantee
correct semantics for double-checked locking in reopen()
and reopen().
http://en.wikipedia.org/wiki/Double-checked_lockingprotected final NodeSerializer nodeSer
protected final HardReferenceQueue<PO> writeRetentionQueue
IRawStore.
The nodes and leaves refer to their parent with a WeakReferences.
Likewise, nodes refer to their children with a WeakReference.
The hard reference queue in combination with touch(AbstractNode)
and with hard references held on the stack ensures that the parent and/or
children remain reachable during operations. Once the node is no longer
strongly reachable weak references to that node may be cleared by the VM -
in this manner the node will become unreachable by navigation from its
ancestors in the btree. The special role of the hard reference queue is
to further ensure that dirty nodes remain dirty by defering persistence
until the reference count for the node is zero during an eviction from
the queue.
Note that nodes are evicted as new nodes are added to the hard reference queue. This occurs in two situations: (1) when a new node is created during a split of an existing node; and (2) when a node is read in from the store. Inserts on this hard reference queue always drive evictions. Incremental writes basically make it impossible for the commit set to get "too large" - the maximum #of nodes to be written is bounded by the size of the hard reference queue. This helps to ensure fast commit operations on the store.
The minimum capacity for the hard reference queue is two (2) so that a split may occur without forcing eviction of either node participating in the split.
Note: The code in Node.postOrderNodeIterator(boolean) and
DirtyChildIterator MUST NOT touch the hard reference queue since
those iterators are used when persisting a node using a post-order
traversal. If a hard reference queue eviction drives the serialization of
a node and we touch the hard reference queue during the post-order
traversal then we break down the semantics of
HardReferenceQueue.add(Object) as the eviction does not
necessarily cause the queue to reduce in length. Another way to handle
this is to have HardReferenceQueue.add(Object) begin to evict
objects before is is actually at capacity, but that is also a bit
fragile.
Note: The writeRetentionQueue uses a HardReferenceQueue.
This is based on a RingBuffer and is very fast. It does not use a
HashMap because we can resolve the WeakReference to the
child Node or Leaf using top-down navigation as long as
the Node or Leaf remains strongly reachable (that is, as
long as it is on the writeRetentionQueue or otherwise strongly
held). This means that lookup in a map is not required for top-down
navigation.
The LRUNexus provides an INodeData / ILeafData
data record cache based on a hash map with lookup by the address of the
node or leaf. This is tested when the child WeakReference was
never set or has been cleared. This cache is also used by the
IndexSegment for the linked-leaf traversal pattern, which does
not use top-down navigation.
There is a discussion of some issues regarding such a policy in the
code inside of
Node.Node(BTree btree, AbstractNode oldRoot, int nentries).
protected int ndistinctOnWriteRetentionQueue
writeRetentionQueue.
protected IndexMetadata metadata
BTree object, but it CAN be changed.
| Constructor Detail |
|---|
protected AbstractBTree(IRawStore store,
INodeFactory nodeFactory,
boolean readOnly,
IndexMetadata metadata,
IRecordCompressorFactory<?> recordCompressorFactory)
store - The persistence store.nodeFactory - Object that provides a factory for node and leaf objects.addrSer - Object that knows how to (de-)serialize the child addresses in
an INodeData.readOnly - true IFF it is known that the
AbstractBTree is read-only.metadata - The IndexMetadata object for this B+Tree.| Method Detail |
|---|
public final BTreeCounters getBtreeCounters()
public final void setBTreeCounters(BTreeCounters btreeCounters)
BTreeCounters.
Note: This is used by the IndexManager to ensure that an index
loaded from its backing store uses the BTreeCounters associated
with that index since the DataService was last (re-)started.
btreeCounters - The counters to be used.
IllegalArgumentException - if the argument is null.public abstract BloomFilter getBloomFilter()
IBloomFilter, transparently
reopen()ing the index if necessary.
null if there is no bloom
filter (including the case where there is a bloom filter but it
has been disabled since the BTree has grown too large and
the expected error rate of the bloom filter would be too high).public ICounterSet getCounters()
CounterSet and the BTreeCounterss.
Note: Since this DOES NOT include the getDynamicCounterSet(),
holding a reference to the returned ICounterSet WILL NOT cause
the AbstractBTree to remain strongly reachable.
getCounters in interface IIndexgetStaticCounterSet(),
getDynamicCounterSet(),
BTreeCounters.getCounters()public CounterSet getDynamicCounterSet()
CounterSet containing dynamic counters - DO
NOT hold onto the returned CounterSet as it contains implicit
hard references to the AbstractBTree and will prevent the
AbstractBTree from being finalized!
public CounterSet getStaticCounterSet()
CounterSet containing static
counters. These counters are all OneShotInstruments and DO NOT
contain implicit references to the AbstractBTree. They may be
safely held and will not cause the AbstractBTree to remain
strongly reachable.
public void close()
close() is to reduce the resource burden of the
index (by discarding buffers) while not rendering the index inoperative.
Unless the AbstractBTree isTransient(), a B+Tree that
has been closed MAY be reopened at any
time (conditional on the continued availability of the backing store).
Such an index reference remains valid after a close(). A closed
index is transparently restored by either getRoot() or
reopen().
Note: A close() on a dirty index MUST discard writes rather than
flushing them to the store and MUST NOT update its Checkpoint
record - (close() is used to discard indices with partial
writes when an AbstractTask fails). If you are seeking to
close() a mutable BTree that it state can be recovered
by reopen() then you MUST write a new Checkpoint record
before closing the index.
Note: CLOSING A TRANSIENT INDEX WILL DISCARD ALL DATA!
This implementation clears the hard reference queue (releasing all node
references), releases the hard reference to the root node, and releases
the buffers on the NodeSerializer (they will be naturally
reallocated on reuse).
Note: AbstractBTree is NOT thread-safe and close() MUST
be invoked in a context in which there will not be concurrent threads --
the natural choice being the single-threaded commit service on the
journal.
IllegalStateException - if the root is null, indicating that the
index is already closed.
IllegalStateException - if the root is dirty (this implies that this is a mutable
btree and there are mutations that have not been written
through to the store)protected final void reopen()
close()/reopen() protocol that may
be used to reduce the resource burden of an AbstractBTree. The
method delegates to _reopen() if double-checked locking
demonstrates that the root is null (indicating
that the index has been closed). This method is automatically invoked by
a variety of methods that need to ensure that the index is available for
use.
close(),
isOpen(),
getRoot()protected abstract void _reopen()
reopen() once root has been show to be
null with double-checked locking. When invoked in this
context, the caller is guaranteed to hold a lock on this. This is
done to ensure that at most one thread gets to re-open the index from the
backing store.
public final boolean isOpen()
close(),
reopen(),
getRoot()public final boolean isTransient()
true iff this is a transient data structure (no
backing store).
protected final void assertNotTransient()
public abstract boolean isReadOnly()
true iff this B+Tree is read-only.
protected final void assertNotReadOnly()
UnsupportedOperationException - if the B+Tree is read-only.isReadOnly()public abstract long getLastCommitTime()
IAtomicStore.commit() in
which writes buffered by this index were made restart-safe on the backing
store. The lastCommitTime is set when the index is loaded from the
backing store and updated after each commit. It is ZERO (0L) when
BTree is first created and will remain ZERO (0L) until the
BTree is committed. If the backing store does not support atomic
commits, then this value will always be ZERO (0L).
Note: This is fixed for an IndexSegment.
public abstract IRawStore getStore()
public final IResourceMetadata[] getResourceMetadata()
IIndex
getResourceMetadata in interface IIndexpublic IndexMetadata getIndexMetadata()
Note: If the B+Tree is read-only then the metadata object will be cloned to avoid potential modification. However, only a single cloned copy of the metadata record will be shared between all callers for a given instance of this class.
getIndexMetadata in interface IIndexnull.IndexMetadata object, cloning each time,
and leaving it to the caller to clone as required. Either the
former or the latter might be a better choice.
FIXME if the metadata record is updated (and it can be) then we really
need to invalidate the cloned metadata record also.public String toString()
toString in class Object
protected boolean rangeCheck(byte[] key,
boolean allowUpperBound)
Note: An index partition is identified by
IndexMetadata.getPartitionMetadata() returning non-null.
Note: This method is used liberally in asserts to detect
errors that can arise is client code when an index partition is split,
joined, or moved.
key - The key.allowUpperBound - true iff the key represents an
inclusive upper bound and thus must be allowed to be LTE to
the right separator key for the index partition. For example,
this would be true for the toKey
parameter on rangeCount or rangeIterator methods.
true always.
IllegalArgumentException - if the key is null
KeyOutOfRangeException - if the key does not lie within the index partition.public final int getBranchingFactor()
public abstract int getHeight()
height := 0. A btree with a
root node and one level of leaves under it has height := 1.
Note that all leaves of a btree are at the same height (this is what is
means for the btree to be "balanced"). Also note that the height only
changes when we split or join the root node (a btree maintains balance by
growing and shrinking in levels from the top rather than the leaves).
public abstract int getNodeCount()
AbstractBTree. This is zero (0)
for a new btree.
public abstract int getLeafCount()
AbstractBTree. This is one (1) for a
new btree.
public abstract int getEntryCount()
AbstractBTree. This is zero
(0) for a new B+Tree. Note that this value is tracked explicitly so it
requires no IOs.
Checkpoint record.
Since getEntryCount() is also used to give the total #of
index enties (and we need that feature) so we need to either add
another method with the appropriate semantics, add a boolean flag,
or add a method returning the #of deleted entries, etc.
public final NodeSerializer getNodeSerializer()
IIndex.
protected final AbstractNode<?> getRoot()
The hard reference to the root node is cleared if the index is
closed. This method automatically reopen()s
the index if it is closed, making it available for use.
protected AbstractNode<?> getRootOrFinger(byte[] key)
#finger and will return it if the key lies within
the finger.
key - The key.
public final Tuple getWriteTuple()
IRangeQuery.KEYS and
IRangeQuery.VALS are requested so that indices which encode part
of the application object within the key can recover the application
object in insert(Object, Object) and remove(Object).
While the Tuple is not safe for use by concurrent threads, the
mutation API is not safe for concurrent threads either.
public final Tuple getLookupTuple()
Tuple that may be used to copy the value associated with
a key out of the AbstractBTree.
While the returned Tuple is not safe for use by concurrent
threads, each instance returned by this method is safe for use by the
thread in which it was obtained.
lookup(byte[], Tuple)public final Tuple getContainsTuple()
Tuple that may be used to test for the existence of
tuples having a given key within the AbstractBTree.
The tuple does not copy either the keys or the values. Contains is
implemented as a lookup operation that either return this tuple or
null. When isolation is supported, the version metadata is
examined to determine if the matching entry is flagged as deleted in
which case contains() will report "false".
While the returned Tuple is not safe for use by concurrent
threads, each instance returned by this method is safe for use by the
thread in which it was obtained.
contains(byte[])
public final Object insert(Object key,
Object value)
IAutoboxBTree
insert in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].value - The value is implicitly converted to a byte[].
null if there was
no value stored under that key.
public final byte[] insert(byte[] key,
byte[] value)
ISimpleBTree
insert in interface ISimpleBTreekey - The key.value - The value (may be null).
null if the
key was not found or if the previous entry for that key was
marked as deleted.
public final Tuple insert(byte[] key,
byte[] value,
boolean delete,
long timestamp,
Tuple tuple)
key - The variable length unsigned byte[] key.value - The variable length byte[] value (MAY be null).delete - true iff the index entry should be marked as
deleted (this behavior is supported iff the btree supports
delete markers).timestamp - The timestamp to be associated with the new or updated index
entry (required iff the btree supports transactional isolation
and otherwise 0L).tuple - Data and metadata for the old value under the key will be
copied onto this object (optional).
null if there was no entry under
that key. See ITuple.isDeletedVersion() to determine
whether or not the entry is marked as deleted.
UnsupportedOperationException - if the index is read-only.public final Object remove(Object key)
IAutoboxBTree
remove in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].
null if the key was not found.public final byte[] remove(byte[] key)
remove in interface ISimpleBTreekey - The key.
null if the key
was not found or if the previous entry under that key was marked
as deleted.
public final Tuple remove(byte[] key,
Tuple tuple)
remove(byte[])
uses insert(byte[], byte[], boolean, long, Tuple) instead of
this method to mark the index entry as deleted when delete markers are
being maintained.
Note: removing a key has no effect on the optional bloom filter. If a key
is removed from the index by this method then the bloom filter will
report a false positive for that key, which will be detected
when we test the index itself. This works out fine in the scale-out
design since the bloom filter is per AbstractBTree instance and
split/join/move operations all result in new mutable BTrees with
new bloom filters to absorb new writes. For a non-scale-out deployement,
this can cause the performance of the bloom filter to degrade if you are
removing a lot of keys. However, in the special case of a BTree
that does NOT use delete markers, BTree.removeAll() will create a
new root leaf and a new (empty) bloom filter as well.
key - The search key.tuple - An object that will be used to report data and metadata for
the pre-existing version (optional).
null if there was no version
under that key.
UnsupportedOperationException - if delete markers are being maintained.
UnsupportedOperationException - if the index is read-only.public abstract void removeAll()
Note: The IIndexManager defines methods for registering (adding)
and dropping indices vs removing the entries in an individual
AbstractBTree.
public Object lookup(Object key)
IAutoboxBTree
lookup in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].
null if there is no
entry for that key.public byte[] lookup(byte[] key)
ISimpleBTree
lookup in interface ISimpleBTreenull if there
is no entry for that key or if the entry under that key is marked
as deleted.
public Tuple lookup(byte[] key,
Tuple tuple)
null from a
missing index entry or (when delete markers are enabled) from a deleted
index entry. Applies the optional bloom filter if it exists. If the bloom
filter exists and reports true, then looks up the value
for the key in the index (note that the key might not exist in the index
since a bloom filter allows false positives, further the key might exist
for a deleted entry).
key - The search key.tuple - An object that will be used to report data and metadata for
the pre-existing version (required).
null if there is no entry in the
index under the key.public boolean contains(Object key)
IAutoboxBTree
contains in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].
public boolean contains(byte[] key)
true, then verifies that the key does in fact
exist in the index.
contains in interface ISimpleBTreekey - The key.
true iff the key does not exist. Or, if the btree
supports isolation, if the key exists but it is marked as
"deleted".public int indexOf(byte[] key)
ILinearList
Note that ILinearList.indexOf(byte[]) is the basis for implementing the
IRangeQuery interface.
indexOf in interface ILinearListkey - The key.
(-(insertion point) - 1). The insertion point is
defined as the point at which the key would be found it it were
inserted into the btree without intervening mutations. Note that
this guarantees that the return value will be >= 0 if and only if
the key is found. When found the index will be in [0:nentries).
Adding or removing entries in the tree may invalidate the index.
pos = -(pos+1) will convert an insertion point to
the index at which the key would be found if it were
inserted - this is also the index of the predecessor of key
in the index.
ILinearList.keyAt(int),
ILinearList.valueAt(int)public byte[] keyAt(int index)
ILinearListISimpleBTree.lookup(byte[]).
keyAt in interface ILinearListindex - The index position of the entry (origin zero).
#indexOf(Object),
ILinearList.valueAt(int)public byte[] valueAt(int index)
ILinearListISimpleBTree.lookup(byte[]).
valueAt in interface ILinearListindex - The index position of the entry (origin zero).
null if
there is a deleted entry at that index position then#indexOf(Object),
ILinearList.keyAt(int)
public final Tuple valueAt(int index,
Tuple tuple)
public final long rangeCountExact(byte[] fromKey,
byte[] toKey)
Note: When the index uses delete markers this requires a key-range scan.
If delete markers are not being used, then the cost is equal to the cost
of rangeCount(byte[], byte[]).
rangeCountExact in interface IRangeQueryfromKey - The lowest key that will be counted (inclusive). When
null there is no lower bound.toKey - The first key that will not be counted (exclusive). When
null there is no upper bound.
public final long rangeCount()
IRangeQueryNote: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.
rangeCount in interface IRangeQuery
public final long rangeCount(Object fromKey,
Object toKey)
fromKey - toKey -
public final long rangeCount(byte[] fromKey,
byte[] toKey)
AbstractNode#indexOf(Object). Since it does not scan the tuples
it can not differentiate between deleted and undeleted tuples for an
index that supports delete markers, but the result will be exact if
delete markers are not being used. The cost is equal to the cost of
lookup of the both keys. If both keys are null, then the
cost is zero (no IOs).
rangeCount in interface IRangeQueryfromKey - The lowest key that will be counted (inclusive). When
null there is no lower bound.toKey - The first key that will not be counted (exclusive). When
null there is no upper bound.
public long rangeCountExactWithDeleted(byte[] fromKey,
byte[] toKey)
rangeCount(byte[], byte[]) already reports deleted tuples
for an AbstractBTree so this method is just delegated to that
one. However,
FusedView.rangeCountExactWithDeleted(byte[], byte[]) treats this
case differently since it must report the exact range count when
considering all sources at once. It uses a range iterator scan visiting
both deleted and undeleted tuples for that.
rangeCountExactWithDeleted in interface IRangeQueryfromKey - The lowest key that will be counted (inclusive). When
null there is no lower bound.toKey - The first key that will not be counted (exclusive). When
null there is no upper bound.
IRangeQuery.rangeCountExact(byte[], byte[])public final ITupleIterator rangeIterator()
IRangeQueryrangeIterator(null, null)
rangeIterator in interface IRangeQuery
public final ITupleIterator rangeIterator(Object fromKey,
Object toKey)
fromKey - toKey -
public final ITupleIterator rangeIterator(byte[] fromKey,
byte[] toKey)
IRangeQuery
rangeIterator in interface IRangeQueryfromKey - The first key that will be visited (inclusive lower bound).
When null there is no lower bound.toKey - The first key that will NOT be visited (exclusive upper
bound). When null there is no upper bound.SuccessorUtil, which may be used to compute the successor of a value
before encoding it as a component of a key.,
BytesUtil#successor(byte[]), which may be used to compute the
successor of an encoded key.,
EntryFilter, which may be used to filter the entries visited by the
iterator.
public final ITupleIterator rangeIterator(Object fromKey,
Object toKey,
int capacity,
int flags,
IFilterConstructor filter)
fromKey - toKey -
public ITupleIterator rangeIterator(byte[] fromKey,
byte[] toKey,
int capacityIsIgnored,
int flags,
IFilterConstructor filter)
Note: If IRangeQuery.CURSOR is specified the returned iterator
supports traversal with concurrent modification by a single-threaded
process (the BTree is NOT thread-safe for writers). Write are
permitted iff AbstractBTree allows writes.
Note: IRangeQuery.REVERSE is handled here by wrapping the
underlying ITupleCursor.
Note: IRangeQuery.REMOVEALL is handled here by wrapping the
iterator.
Note:
FusedView.rangeIterator(byte[], byte[], int, int, IFilterConstructor)
is also responsible for constructing an ITupleIterator in a
manner similar to this method. If you are updating the logic here, then
check the logic in that method as well!
rangeIterator in interface IRangeQueryfromKey - The first key that will be visited (inclusive lower bound).
When null there is no lower bound.toKey - The first key that will NOT be visited (exclusive upper
bound). When null there is no upper bound.capacityIsIgnored - The #of entries to buffer at a time. This is a hint and MAY be
zero (0) to use an implementation specific default
capacity. A non-zero value may be used if you know that you
want at most N results or if you want to override the default
#of results to be buffered before sending them across a
network interface. (Note that you can control the default
value using
IBigdataClient.Options#DEFAULT_CLIENT_RANGE_QUERY_CAPACITY).flags - A bitwise OR of IRangeQuery.KEYS, IRangeQuery.VALS, etc.filter - An optional object used to construct a stacked iterator. When
IRangeQuery.CURSOR is specified in flags, the base
iterator will implement ITupleCursor and the first
filter in the stack can safely cast the source iterator to an
ITupleCursor. If the outermost filter in the stack
does not implement ITupleIterator, then it will be
wrapped an ITupleIterator.SuccessorUtil, which may be used to compute the successor of a value
before encoding it as a component of a key.,
BytesUtil#successor(byte[]), which may be used to compute the
successor of an encoded key.,
IFilterConstructor, which may be used to construct an iterator stack
performing filtering or other operations.
public long rangeCopy(IIndex src,
byte[] fromKey,
byte[] toKey,
boolean overflow)
src - The source index.fromKey - The first key that will be copied (inclusive). When
null there is no lower bound.toKey - The first key that will NOT be copied (exclusive). When
null there is no upper bound.overflow - When true the IOverflowHandler defined
for the source index (if any) will be applied to copy raw
records onto the backing store for this index. This value
SHOULD be false if the two indices have the
same backing store and true otherwise.
UnsupportedOperationException - if the src and this do not have the same
setting for IndexMetadata.getDeleteMarkers()
UnsupportedOperationException - if the src and this do not have the same
setting for IndexMetadata.getVersionTimestamps()CompactTask,
OverflowManager
public Object submit(byte[] key,
ISimpleIndexProcedure proc)
IIndex
submit in interface IIndexkey - The key.proc - The procedure.
IIndexProcedure.apply(IIndex)
public void submit(byte[] fromKey,
byte[] toKey,
IKeyRangeIndexProcedure proc,
IResultHandler handler)
IIndex
Note: Since this variant of submit() does not split keys the
fromIndex and toIndex in the Splits reported to
the IResultHandler will be zero (0).
submit in interface IIndexfromKey - The lower bound (inclusive) -or- null if there
is no lower bound.toKey - The upper bound (exclusive) -or- null if there
is no upper bound.proc - The procedure. If the procedure implements the
IParallelizableIndexProcedure marker interface then it
MAY be executed in parallel against the relevant index
partition(s).
public void submit(int fromIndex,
int toIndex,
byte[][] keys,
byte[][] vals,
AbstractKeyArrayIndexProcedureConstructor ctor,
IResultHandler aggregator)
IIndex
Note: This may be used to send custom logic together with the data to a
remote index or index partition. When the index is remote both the
procedure and the return value MUST be Serializable.
Note: The scale-out indices add support for auto-split of the procedure such that it runs locally against each relevant index partition.
submit in interface IIndexfromIndex - The index of the first key to be used (inclusive).toIndex - The index of the last key to be used (exclusive).keys - The keys (required).vals - The values (optional depending on the procedure).ctor - An object that can create instances of the procedure.aggregator - When defined, results from each procedure application will be
reported to this object.public AbstractNode getRightMostNode(boolean nodesOnly)
nodesOnly - when true the search will halt at the
right-most non-leaf. Otherwise it will return the right-most
leaf.
null if
the root of the B+Tree is a leaf and
nodesOnly == true.public abstract ILeafCursor newLeafCursor(SeekEnum where)
public abstract ILeafCursor newLeafCursor(byte[] key)
key - The key (required).
IllegalArgumentException - if key is null.public int[] getUtilization()
public boolean dump(PrintStream out)
out - The dump is written on this stream.
public boolean dump(org.apache.log4j.Level level,
PrintStream out)
protected void touch(AbstractNode<?> node)
Touch the node or leaf on the writeRetentionQueue. If the node
is not found on a scan of the head of the queue, then it is appended to
the queue and its AbstractNode.referenceCount is incremented. If
a node is being appended to the queue and the queue is at capacity, then
this will cause a reference to be evicted from the queue. If the
reference counter for the evicted node or leaf is zero and the evicted
node or leaf is dirty, then the evicted node or leaf will be coded and
written onto the backing store. A subsequent attempt to modify the node
or leaf will force copy-on-write for that node or leaf. Regardless of
whether or not the node or leaf is dirty, it is touched on the
LRUNexus#getGlobalLRU() when it is evicted from the write
retention queue.
This method guarantees that the specified node will NOT be synchronously persisted as a side effect and thereby made immutable. (Of course, the node may be already immutable.)
In conjunction with DefaultEvictionListener, this method
guarentees that the reference counter for the node will reflect the #of
times that the node is actually present on the
writeRetentionQueue.
node - The node or leaf.
Note: I have since seen an exception where the evicted node
reference was [null]. The problem is that the
HardReferenceQueue is NOT thread-safe. Concurrent readers
driving HardReferenceQueue.add(Object) can cause the data
structure to become inconsistent. This is not much of a problem for
readers since the queue is being used to retain hard references -
effectively a cache and the null reference could be ignored. For
writers, we are always single threaded.
Still, it seems best to either make this method synchronized or to
replace the HardReferenceQueue with an
ArrayBlockingQueue since there could well be side-effects
of concurrent appends on the queue with concurrent writers. The
latency for synchronization will be short for readers since
eviction is a NOP while the latency of append can be high for a
single threaded writer since IO may result.
The use of an ArrayBlockingQueue opens up some interesting
possibilities since a concurrent thread could now handle
serialization and eviction of nodes while the caller was blocked
and the caller would not need to wait for serialization and IO
unless the queue was full. This really sounds like the same old
concept of a write retention queue (we do not want to evict nodes
too quickly or we will make them persistent while they are still
likely to be touched) and a read retention queue (keeping nodes
around for a while after they have been made persistent) with the
new twist being that serialization and IO could be asynchronous in
the zone of action available between those queues in order to keep
the write reference queue at a minimium capacity).
protected final void writeNodeRecursive(AbstractNode<?> node)
Nodes.
Note: This will throw an exception if the backing store is read-only.
node - The root of the hierarchy of nodes to be written. The node
MUST be dirty. The node this does NOT have to be the root of
the tree and it does NOT have to be a Node.protected long writeNodeOrLeaf(AbstractNode<?> node)
writeRetentionQueue, the B+Tree continuously converts nodes and
leaves to their more compact coded record forms which results in a
smaller in memory footprint.
Note: For a transient B+Tree, this merely codes the node but does not write the node on the store (there is none).
UnsupportedOperationException - if the B+Tree (or the backing store) is read-only.protected AbstractNode<?> readNodeOrLeaf(long addr)
Note: Callers SHOULD be synchronized in order to ensures that only one thread will read the desired node or leaf in from the store and attach the reference to the newly read node or leaf as appropriate into the existing data structures (e.g., as the root reference or as a child of a node or leaf already live in memory).
addr - The address in the store.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||