|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.htree.AbstractHTree
public abstract class AbstractHTree
Abstract base class for a persistence capable extensible hash tree.
| Nested Class Summary | |
|---|---|
static class |
AbstractHTree.HTreePageStateException
Exception that can be thrown for asserts and testing, retaining the source page of the error |
| Field Summary | |
|---|---|
protected int |
addressBits
The #of bits in the address space for a directory page (from the constructor). |
protected int |
bucketSlots
The #of bucket slots in a bucket page. |
static org.apache.log4j.Logger |
dumpLog
Log for dump(PrintStream) and friends. |
protected static String |
ERROR_CLOSED
The index is already closed. |
protected static String |
ERROR_LESS_THAN_ZERO
A parameter was less than zero. |
protected static String |
ERROR_READ_ONLY
The index is read-only but a mutation operation was requested. |
protected static String |
ERROR_TOO_LARGE
A parameter was too large. |
protected static String |
ERROR_TRANSIENT
The index is transient (not backed by persistent storage) but an operation that requires persistence was requested. |
protected HTreeIndexMetadata |
metadata
The metadata record for the index. |
protected int |
ndistinctOnWriteRetentionQueue
The #of distinct nodes and leaves on the writeRetentionQueue. |
protected NodeSerializer |
nodeSer
Used to serialize and de-serialize the nodes and leaves of the tree. |
protected boolean |
readOnly
When true the AbstractHTree does not permit
mutation. |
protected com.bigdata.htree.DirectoryPage |
root
The root directory. |
protected IRawStore |
store
The backing store. |
protected IHardReferenceQueue<PO> |
writeRetentionQueue
Nodes (that is nodes or leaves) are added to a hard reference queue when they are created or read from the store. |
| Constructor Summary | |
|---|---|
protected |
AbstractHTree(IRawStore store,
INodeFactory nodeFactory,
boolean readOnly,
HTreeIndexMetadata metadata,
IRecordCompressorFactory<?> recordCompressorFactory)
|
| Method Summary | |
|---|---|
protected abstract void |
_reopen()
This method is responsible for setting up the root leaf (either new or read from the store), the bloom filter, etc. |
protected void |
assertNotReadOnly()
|
protected void |
assertNotTransient()
|
void |
checkConsistency(com.bigdata.htree.AbstractPage pge,
boolean full)
|
void |
checkConsistency(boolean full)
Checks globalDepth of each node and also whether any BucketPages are empty. |
void |
close()
The contract for ICheckpointProtocol.close() is to reduce the resource burden of the
index while not rendering the index inoperative. |
static long |
decodeRecordAddr(byte[] buf)
Decodes a signed long value as encoded by #appendSigned(long). |
boolean |
dump(org.apache.log4j.Level level,
PrintStream out,
boolean materialize)
|
boolean |
dump(PrintStream out)
Recursive dump of the tree. |
static byte[] |
encodeRecordAddr(ByteArrayBuffer recordAddrBuf,
long addr)
Encode a raw record address into a byte[] suitable for storing in the value associated with a tuple and decoding using decodeRecordAddr(byte[]) |
int |
getAddressBits()
The #of bits in the address space for a hash directory page. |
BTreeCounters |
getBtreeCounters()
Counters tracking various aspects of the btree. |
CounterSet |
getCounters()
Return performance counters. |
abstract long |
getEntryCount()
The #of tuples in the HTree. |
HTreeIndexMetadata |
getIndexMetadata()
Returns the metadata record for this index. |
abstract long |
getLastCommitTime()
The timestamp associated with the last IAtomicStore.commit() in
which writes buffered by this index were made restart-safe on the backing
store. |
abstract long |
getLeafCount()
The #of BucketPages in the HTree (not buddy hash buckets,
but the pages on which they appear). |
abstract long |
getNodeCount()
The #of DirectoryPages in the HTree (not buddy hash
tables, but the pages on which they appear). |
protected com.bigdata.htree.DirectoryPage |
getRoot()
The root of the HTree. |
IRawStore |
getStore()
The backing store. |
boolean |
isOpen()
An "open" index has may have some buffered data. |
boolean |
isReadOnly()
Return true iff this B+Tree is read-only. |
boolean |
isTransient()
Return true iff this is a transient data structure (no
backing store). |
abstract long |
rangeCount()
The #of index entries. |
ITupleIterator |
rangeIterator()
Simple iterator visits all tuples in the HTree in order by the
effective prefix of their keys. |
protected com.bigdata.htree.AbstractPage |
readNodeOrLeaf(long addr)
Read an AbstractPage from the store. |
void |
reopen()
(Re-) open the index. |
void |
setBTreeCounters(BTreeCounters btreeCounters)
Replace the BTreeCounters. |
String |
toString()
Fast summary information about the B+Tree. |
protected void |
touch(com.bigdata.htree.AbstractPage node)
This method is responsible for putting the node or leaf onto the ring buffer which controls (a) how long we retain a hard reference to the node or leaf; and (b) for writes, when the node or leaf is evicted with a zero reference count and made persistent (along with all dirty children). |
Iterator<byte[]> |
values()
Visits the values stored in the HTree in order by the effective
prefix of their keys. |
protected long |
writeNodeOrLeaf(com.bigdata.htree.AbstractPage node)
Codes the node and writes the coded record on the store (non-recursive). |
protected void |
writeNodeRecursive(com.bigdata.htree.AbstractPage node)
Write a dirty node and its children using a post-order traversal that first writes any dirty leaves and then (recursively) their parent nodes. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface com.bigdata.btree.ICheckpointProtocol |
|---|
getCheckpoint, getDirtyListener, getMetadataAddr, getRecordVersion, getRootAddr, removeAll, scan, setDirtyListener, setLastCommitTime, writeCheckpoint, writeCheckpoint2 |
| Methods inherited from interface com.bigdata.journal.ICommitter |
|---|
handleCommit |
| Field Detail |
|---|
protected static final String ERROR_CLOSED
protected static final String ERROR_LESS_THAN_ZERO
protected static final String ERROR_TOO_LARGE
protected static final String ERROR_READ_ONLY
protected static final String ERROR_TRANSIENT
public static final org.apache.log4j.Logger dumpLog
dump(PrintStream) and friends.
protected final IRawStore store
protected final boolean readOnly
true the AbstractHTree does not permit
mutation.
protected final int addressBits
2^addressBits entries. Those entries are
divided up among one or more buddy hash tables on that page.
protected final int bucketSlots
protected volatile com.bigdata.htree.DirectoryPage root
protected final IHardReferenceQueue<PO> writeRetentionQueue
IRawStore. The
nodes and leaves refer to their parent with a WeakReferences.
Likewise, nodes refer to their children with a WeakReference. The
hard reference queue in combination with touch(AbstractPage) and
with hard references held on the stack ensures that the parent and/or
children remain reachable during operations. Once the node is no longer
strongly reachable weak references to that node may be cleared by the VM
- in this manner the node will become unreachable by navigation from its
ancestors in the btree. The special role of the hard reference queue is
to further ensure that dirty nodes remain dirty by defering persistence
until the reference count for the node is zero during an eviction from
the queue.
Note that nodes are evicted as new nodes are added to the hard reference queue. This occurs in two situations: (1) when a new node is created during a split of an existing node; and (2) when a node is read in from the store. Inserts on this hard reference queue always drive evictions. Incremental writes basically make it impossible for the commit set to get "too large" - the maximum #of nodes to be written is bounded by the size of the hard reference queue. This helps to ensure fast commit operations on the store.
The minimum capacity for the hard reference queue is two (2) so that a split may occur without forcing eviction of either node participating in the split.
Note: The code in
AbstractPage.postOrderNodeIterator(boolean, boolean) and
DirtyChildIterator MUST NOT touch the hard reference queue since
those iterators are used when persisting a node using a post-order
traversal. If a hard reference queue eviction drives the serialization of
a node and we touch the hard reference queue during the post-order
traversal then we break down the semantics of
HardReferenceQueue.add(Object) as the eviction does not
necessarily cause the queue to reduce in length. Another way to handle
this is to have HardReferenceQueue.add(Object) begin to evict
objects before is is actually at capacity, but that is also a bit
fragile.
Note: The writeRetentionQueue uses a HardReferenceQueue.
This is based on a RingBuffer and is very fast. It does not use a
HashMap because we can resolve the WeakReference to the
child DirectoryPage or BucketPage using top-down
navigation as long as the DirectoryPage or BucketPage
remains strongly reachable (that is, as long as it is on the
writeRetentionQueue or otherwise strongly held). This means that
lookup in a map is not required for top-down navigation.
protected int ndistinctOnWriteRetentionQueue
writeRetentionQueue.
protected HTreeIndexMetadata metadata
HTree object, but it CAN be changed.
protected final NodeSerializer nodeSer
| Constructor Detail |
|---|
protected AbstractHTree(IRawStore store,
INodeFactory nodeFactory,
boolean readOnly,
HTreeIndexMetadata metadata,
IRecordCompressorFactory<?> recordCompressorFactory)
store - The persistence store.nodeFactory - Object that provides a factory for node and leaf objects.readOnly - true IFF it is known that the
AbstractHTree is read-only.metadata - The IndexMetadata object for this
AbstractHTree.recordCompressorFactory - Object that knows how to (de-)compress the serialized data
records.
IllegalArgumentException - if addressBits is LT ONE (1).
IllegalArgumentException - if addressBits is GT (16).| Method Detail |
|---|
public final BTreeCounters getBtreeCounters()
public final void setBTreeCounters(BTreeCounters btreeCounters)
BTreeCounters.
Note: This is used by the IndexManager to ensure that an index
loaded from its backing store uses the BTreeCounters associated
with that index since the DataService was last (re-)started.
btreeCounters - The counters to be used.
IllegalArgumentException - if the argument is null.public CounterSet getCounters()
ICounterSetAccess
getCounters in interface ICounterSetAccesspublic HTreeIndexMetadata getIndexMetadata()
Note: If the index is read-only then the metadata object will be cloned to avoid potential modification. However, only a single cloned copy of the metadata record will be shared between all callers for a given instance of this class.
getIndexMetadata in interface ICheckpointProtocolnull.IIndex.getIndexMetadata()IndexMetadata object, cloning each time,
and leaving it to the caller to clone as required. Either the
former or the latter might be a better choice.
FIXME if the metadata record is updated (and it can be) then we really
need to invalidate the cloned metadata record also.public void close()
ICheckpointProtocol.close() is to reduce the resource burden of the
index while not rendering the index inoperative. An index that has been
closed MAY be reopened at any time
(conditional on the continued availability of the backing store). Such an
index reference remains valid after a ICheckpointProtocol.close(). A closed index is
transparently reopened by any access to the index data
(scanning the index, probing the index, etc).
Note: A ICheckpointProtocol.close() on a dirty index MUST discard writes rather than
flushing them to the store and MUST NOT update its Checkpoint
record - (ICheckpointProtocol.close() is used to discard indices with partial writes
when an AbstractTask fails). If you are seeking to
ICheckpointProtocol.close() a mutable index view that state can be recovered by
ICheckpointProtocol.reopen() then you MUST write a new Checkpoint record
before closing the index.
Note: CLOSING A TRANSIENT INDEX WILL DISCARD ALL DATA!
This implementation clears the hard reference queue (releasing all node
references), releases the hard reference to the root node, and releases
the buffers on the NodeSerializer (they will be naturally
reallocated on reuse).
Note: AbstractHTree is NOT thread-safe and close() MUST
be invoked in a context in which there will not be concurrent threads --
the natural choice being the single-threaded commit service on the
journal.
close in interface ICheckpointProtocolIllegalStateException - if the root is null, indicating that the
index is already closed.
IllegalStateException - if the root is dirty (this implies that this is a mutable
HTree and there are mutations that have not been written
through to the store)public final void reopen()
ICheckpointProtocol.close() /
ICheckpointProtocol.reopen() protocol. That protocol may be used to reduce the
resource burden of an index. This method is automatically invoked by a
variety of methods that need to ensure that the index is available for
use..
This method delegates to _reopen() if double-checked locking
demonstrates that the root is null (indicating that
the index has been closed). This method is automatically invoked by a
variety of methods that need to ensure that the index is available for
use.
reopen in interface ICheckpointProtocolICheckpointProtocol.close(),
ICheckpointProtocol.isOpen(),
#getRoot()protected abstract void _reopen()
reopen() once root has been show to be
null with double-checked locking. When invoked in this
context, the caller is guaranteed to hold a lock on this. This is
done to ensure that at most one thread gets to re-open the index from the
backing store.
public final boolean isOpen()
ICheckpointProtocol
isOpen in interface ICheckpointProtocolICheckpointProtocol.close(),
ICheckpointProtocol.reopen()public final boolean isTransient()
true iff this is a transient data structure (no
backing store).
protected final void assertNotTransient()
public final boolean isReadOnly()
true iff this B+Tree is read-only.
protected final void assertNotReadOnly()
UnsupportedOperationException - if the B+Tree is read-only.isReadOnly()public abstract long getLastCommitTime()
IAtomicStore.commit() in
which writes buffered by this index were made restart-safe on the backing
store. The lastCommitTime is set when the index is loaded from the
backing store and updated after each commit. It is ZERO (0L) when
HTree is first created and will remain ZERO (0L) until the
HTree is committed. If the backing store does not support atomic
commits, then this value will always be ZERO (0L).
public IRawStore getStore()
public final int getAddressBits()
2^addressBits. When addressBits is 10 we
have 2^10 := 1024. If the size of a child address is 4
bytes, then a 10 bit address space implies a 4k page size.
public abstract long getNodeCount()
DirectoryPages in the HTree (not buddy hash
tables, but the pages on which they appear).
public abstract long getLeafCount()
BucketPages in the HTree (not buddy hash buckets,
but the pages on which they appear).
public abstract long getEntryCount()
HTree.
protected final com.bigdata.htree.DirectoryPage getRoot()
HTree. This is always a DirectoryPage.
The hard reference to the root node is cleared if the index is
closed. This method automatically reopen()s the
index if it is closed, making it available for use.
public String toString()
toString in class Objectpublic boolean dump(PrintStream out)
out - The dump is written on this stream.
public boolean dump(org.apache.log4j.Level level,
PrintStream out,
boolean materialize)
protected void touch(com.bigdata.htree.AbstractPage node)
This method is responsible for putting the node or leaf onto the ring buffer which controls (a) how long we retain a hard reference to the node or leaf; and (b) for writes, when the node or leaf is evicted with a zero reference count and made persistent (along with all dirty children). The concurrency requirements and the implementation behavior and guarentees differ depending on whether the B+Tree is read-only or mutable for two reasons: For writers, the B+Tree is single-threaded so there is no contention. For readers, every touch on the B+Tree goes through this point, so it is vital to make this method non-blocking.
This method guarantees that the specified node will NOT be synchronously persisted as a side effect and thereby made immutable. (Of course, the node may be already immutable.)
In conjunction with DefaultEvictionListener, this method
guarantees that the reference counter for the node will reflect the #of
times that the node is actually present on the
writeRetentionQueue.
If the node is not found on a scan of the head of the queue, then it is
appended to the queue and its AbstractPage.referenceCount is
incremented. If a node is being appended to the queue and the queue is at
capacity, then this will cause a reference to be evicted from the queue.
If the reference counter for the evicted node or leaf is zero and the
evicted node or leaf is dirty, then a data record will be coded for the
evicted node or leaf and written onto the backing store. A subsequent
attempt to modify the node or leaf will force copy-on-write for that node
or leaf. Regardless of whether or not the node or leaf is dirty, it is
touched on the LRUNexus#getGlobalLRU() when it is evicted from
the write retention queue.
For the mutable B+Tree we also track the #of references to the node/leaf on the ring buffer. When that reference count reaches zero we do an eviction and the node/leaf is written onto the backing store if it is dirty. Those reference counting games DO NOT matter for read-only views so we can take a code path which does not update the per-node/leaf reference count and we do not need to use either synchronization or atomic counters to track the reference counts.
In order to reduce contention for the lock required to update the backing
queue, the writeRetentionQueue is configured to collect
references for touched nodes or leaves in a thread-local queue and then
batch those references through to the backing hard reference queue while
holding the lock.
node - The node or leaf.ndistinctOnWriteRetentionQueue fields are not guaranteed
to be consistent for of concurrent readers since no locks are held
when those fields are updated. Since the reference counts only
effect when a node is made persistent (assuming its dirty) and
since we already require single threaded access for writes on the
btree, this does not cause a problem but can lead to unexpected
values for the reference counters and
ndistinctOnWriteRetentionQueue.protected final void writeNodeRecursive(com.bigdata.htree.AbstractPage node)
Nodes.
Note: This will throw an exception if the backing store is read-only.
node - The root of the hierarchy of nodes to be written. The node
MUST be dirty. The node this does NOT have to be the root of
the tree and it does NOT have to be a Node.protected long writeNodeOrLeaf(com.bigdata.htree.AbstractPage node)
writeRetentionQueue, the B+Tree continuously converts nodes and
leaves to their more compact coded record forms which results in a
smaller in memory footprint.
Note: For a transient B+Tree, this merely codes the node but does not write the node on the store (there is none).
UnsupportedOperationException - if the B+Tree (or the backing store) is read-only.protected com.bigdata.htree.AbstractPage readNodeOrLeaf(long addr)
AbstractPage from the store.
Note: Callers SHOULD be synchronized in order to ensure that only one thread will read the desired node or leaf in from the store and attach the reference to the newly read node or leaf as appropriate into the existing data structures (e.g., as the root reference or as a child of a node or leaf already live in memory).
Note: The caller MUST set the AbstractPage.globalDepth on the
returned value.
addr - The address in the store.
AbstractPage.
IllegalArgumentException - if the address is IAddressManager.NULL.
public static byte[] encodeRecordAddr(ByteArrayBuffer recordAddrBuf,
long addr)
decodeRecordAddr(byte[])
recordAddrBuf - The buffer that will be used to format the record address.addr - The raw record address.
public static long decodeRecordAddr(byte[] buf)
#appendSigned(long).
buf - The buffer containing the encoded record address.
public abstract long rangeCount()
rangeCount in interface ICheckpointProtocolIRangeQuery.rangeCount()public ITupleIterator rangeIterator()
HTree in order by the
effective prefix of their keys. Since the key is typically a hash of some
fields in the associated application data record, this will normally
visit application data records in what appears to be an arbitrary order.
Tuples within a buddy hash bucket will be delivered in a random order.
Note: The HTree does not currently maintain metadata about the
#of spanned tuples in a DirectoryPage. Without that we can not
provide fast range counts, linear list indexing, etc.
public Iterator<byte[]> values()
HTree in order by the effective
prefix of their keys. Since the key is typically a hash of some fields in
the associated application data record, this will normally visit
application data records in what appears to be an arbitrary order. Tuples
within a buddy hash bucket will be delivered in a random order.
Note: This allocates a new byte[] for each visited value. It is more
efficient to reuse a buffer for each visited Tuple. This can be
done using rangeIterator().
public void checkConsistency(boolean full)
full - indicates whether to check consistency on disk
public void checkConsistency(com.bigdata.htree.AbstractPage pge,
boolean full)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||