Package com.bigdata.btree

The BTree is a scalable B+-Tree with copy-on-write semantics mapping variable length unsigned byte[] keys to variable length byte[] values (null values are allowed).

See:
          Description

Interface Summary
AbstractBTree.IBTreeCounters Interface declaring namespaces for performance counters collected for a B+Tree.
HTreeIndexMetadata.Options HTree specific options.
IAbstractNode Interface for a node or a leaf of a B+-Tree.
IAutoboxBTree An interface defining non-batch methods for inserting, removing, lookup, and containment tests where keys and values are implicitly converted to and from byte[]s using the ITupleSerializer configured on the IndexMetadata object for the IIndex.
IBloomFilter Interface for bloom filter implementations using an unsigned byte[] key.
IBTreeStatistics Interface used to report out some statistics about a B+Tree.
IBTreeUtilizationReport B+Tree utilization report.
ICheckpointProtocol Interface in support of the Checkpoint record protocol.
ICounter An interface for a counter.
IDirty An interface that declares how we access the dirty state of an object.
IDirtyListener An interface that may be used to learn when a BTree becomes dirty.
IEvictionListener Interface to handle evictions of nodes or leaves from the hard reference queue.
IIdentityAccess An interface that declares how we access the persistent identity of an object.
IIndex Interface for mutable B+-Tree mapping arbitrary non-null keys to arbitrary values.
IIndexLocalCounter An interface for accessing an index local counter.
ILeafCursor<L extends Leaf> Leaf cursor interface.
ILinearList Interface for methods that return or accept an ordinal index into the entries in the B+-Tree.
ILocalBTreeView Interface indicates that the index is local rather than remote.
IndexMetadata.Options Options and their defaults for the com.bigdata.btree package and the BTree and IndexSegment classes.
INodeFactory Interface for creating nodes or leaves.
INodeIterator Interface for iterators that visit nodes and leaves rather than entries in leaves.
IOverflowHandler An interface that allows you to inspect index entries during an IndexSegmentBuilder operation.
IRangeQuery Interface for range count and range query operations.
IRawRecordAccess Interface providing access to raw records.
ISimpleBTree Interface for non-batch operations on a B+-Tree mapping non-null variable length unsigned byte[] keys to arbitrary values.
ISimpleSplitHandler Interface allows an application to constrain the choice of the separator key when an index partition is split.
ITuple<E> Interface exposes more direct access to keys and values visited by an ITupleIterator.
ITupleCursor<E> Interface for sequential and random-access cursor-based ITuple operations on an index or index partition.
ITupleCursor2<E> Extended interface.
ITupleIterator<E> Interface visits ITuples populated with the data and metadata for visited index entries.
ITupleSerializer<K,V> An interface that provides for the (de)-serialization of the value of a tuple stored in an index and, when possible, the key under which that value is stored.
Leaf.ILeafListener An interface that may be used to register for and receive events when the state of a Leaf is changed.
 

Class Summary
AbstractBTree Base class for mutable and immutable B+-Tree implementations.
AbstractBTreeTupleCursor<I extends AbstractBTree,L extends Leaf,E> Class supporting random access to tuples and sequential tuple-based cursor movement for an AbstractBTree.
AbstractBTreeTupleCursor.MutableBTreeTupleCursor<E> An ITuple that directly supports forward and reverse cursor operations on a local mutable BTree.
AbstractBTreeTupleCursor.ReadOnlyBTreeTupleCursor<E> An ITuple that directly supports forward and reverse cursor operations on a local BTree.
AbstractChunkedTupleIterator<E> A chunked iterator that proceeds a ResultSet at a time.
AbstractNode<T extends AbstractNode> Abstract node supporting incremental persistence and copy-on-write semantics.
AbstractTuple<E> Abstract base class with much of the functionality of ITuple.
AsynchronousIndexWriteConfiguration Configuration for the asynchronous index write API.
BigdataMap<K,V> A flyweight SortedMap wrapping an IIndex.
BigdataSet<E> A SortedSet backed by a B+Tree.
BloomFilter Encapsulates the actual implementation class and provides the protocol for (de-)serialization.
BloomFilter.BloomFilterCounters Counters for bloom filter access and notification of false positives.
BloomFilterFactory An interface that is used to generate a bloom filter for an AbstractBTree and which allows the caller to specify the expected number of index entries, the desired error rate for the filter at that #of index entries, and the maximum error rate before the bloom filter will be disabled.
BTree This class implements a variant of a B+Tree in which all values are stored in leaves, but the leaves are not connected with prior-next links.
BTree.Counter Mutable counter.
BTree.NodeFactory Factory for mutable nodes and leaves used by the NodeSerializer.
BTree.PartitionedCounter Places the counter values into a namespace formed by the partition identifier.
BTree.Stack A simple stack based on an array used to maintain hard references for the parent Nodes in the BTree.LeafCursor.
BTreeCounters A helper class that collects statistics on an AbstractBTree.
BTreeStatistics A snapshot of the B+Tree statistics.
BTreeUtilizationReport A btree utilization report.
BytesUtil Class supporting operations on variable length byte[] keys.
BytesUtil.UnsignedByteArrayComparator Compares two unsigned byte[]s.
Checkpoint A checkpoint record is written each time the btree is flushed to the store.
ChunkedLocalRangeIterator<E> Chunked range iterator running against a local index or index view.
DefaultEvictionListener Hard reference cache eviction listener writes a dirty node or leaf onto the persistence store.
DefaultTupleSerializer<K,V> Default implementation uses the KeyBuilder to format the object as a key and uses Java default serialization for the value.
DelegateIndex An object that delegates its IIndex interface.
DelegateTuple<E> An ITuple wrapping a delegate that may be used to override some of the methods on the delegate object.
DumpIndex Utility class to dump an index in a variety of ways.
DumpIndex.PageStats Class reports various summary statistics for nodes and leaves.
DumpIndexSegment Utility to examine the context of an IndexSegmentStore.
EntryScanIterator Iterator visits index entries (dereferencing visited tuples to the application objects stored within those tuples).
Errors Error messages for the B+Tree package.
FixedLengthPrefixSplits Imposes constraint that the key before the separatorKey must differ in the first N bytes from the key after the separator key.
HTreeIndexMetadata HTree specific implementation.
IndexMetadata The persistent and mostly immutable metadata for a AbstractBTree.
IndexSegment An index segment is read-only btree corresponding to some key range of a potentially distributed index.
IndexSegment.ImmutableNodeFactory Factory for immutable nodes and leaves used by the NodeSerializer.
IndexSegment.ImmutableNodeFactory.ImmutableLeaf Immutable leaf throws UnsupportedOperationException for the public mutator API but does not try to override all low-level mutation behaviors.
IndexSegment.ImmutableNodeFactory.ImmutableNode Immutable node throws UnsupportedOperationException for the public mutator API but does not try to override all low-level mutation behaviors.
IndexSegment.IndexSegmentTupleCursor<E> Implementation for an immutable IndexSegment.
IndexSegmentAddressManager Address manager supporting offsets that are encoded for one of several regions in an IndexSegmentStore.
IndexSegmentBuilder Builds an IndexSegment given a source btree and a target branching factor.
IndexSegmentBuilder.AbstractSimpleNodeData Abstract base class for classes used to construct and serialize nodes and leaves written onto the index segment.
IndexSegmentBuilder.NOPNodeFactory Factory does not support node or leaf creation.
IndexSegmentBuilder.SimpleLeafData A class that can be used to (de-)serialize the data for a leaf without any of the logic for operations on the leaf.
IndexSegmentBuilder.SimpleNodeData A class that can be used to (de-)serialize the data for a node without any of the logic for operations on the node.
IndexSegmentCheckpoint The checkpoint record for an IndexSegment.
IndexSegmentDumpUtil  
IndexSegmentMultiBlockIterator<E> A fast iterator based on multi-block IO for the IndexSegment.
IndexSegmentPlan A plan for building a B+-Tree based on an input branching factor and #of entries.
IndexSegmentStore A read-only store backed by a file containing a single IndexSegment.
Leaf A B+-Tree leaf.
LeafTupleIterator<E> Visits the values of a Leaf in the external key ordering.
MutableLeafData Implementation maintains Java objects corresponding to the persistent data and defines methods for a variety of mutations on the ILeafData record which operate by direct manipulation of the Java objects.
MutableNodeData Implementation maintains Java objects corresponding to the persistent data and defines methods for a variety of mutations on the INodeData record which operate by direct manipulation of the Java objects.
Node A non-leaf node.
NodeSerializer An instance of this class is used to serialize and de-serialize the INodeDatas and ILeafDatas of an AbstractBTree.
NOPBloomFilter A bloom filter that never reports false (this means that you must always check the index) and that does not permit anything to be added and, in fact, has no state.
NOPEvictionListener A listener that does nothing.
NOPTupleSerializer Default implementation uses the KeyBuilder to format the object as a key and requires that the values are byte[]s which it passes on without change.
PO A persistent object.
RangeCheckUtil Utility class to verify that a key lies within a key range.
ReadCommittedView A view of a named index that replaces its view for each high-level request if there has been an intervening commit on the backing store.
ReadOnlyCounter A read-only view of an ICounter.
ReadOnlyEntryIterator<E> Iterator disallows ReadOnlyEntryIterator.remove().
ReadOnlyIndex A fly-weight wrapper that does not permit write operations and reads through onto an underlying IIndex.
ResultSet An object used to stream key scan results back to the client.
ScatterSplitConfiguration Configuration object for scatter split behavior for a scale-out index.
Tuple<E> A key-value pair used to facilitate some iterator constructs.
UnisolatedReadWriteIndex A view onto an unisolated index partition which enforces the constraint that either concurrent writers -or- a single writer may have access to the unisolated index at any given time.
ViewStatistics Helper class collects some statistics about an ILocalBTreeView.
 

Enum Summary
IndexSegmentRegion Type-safe enumeration of the regions to which relative offsets may be constructed for an IndexSegmentStore.
IndexTypeEnum Type safe enumeration of index types.
SeekEnum Typesafe enum used to indicate that an ILeafCursor should seek to the first or last leaf in the B+Tree.
 

Exception Summary
KeyAfterPartitionException An exception thrown when a key lies after the half-open range of an index partition.
KeyBeforePartitionException Exception thrown when a key is before the start of the half-open range of an index partition.
KeyOutOfRangeException An exception thrown when the key is outside of the half-open range constraint for a ITupleCursor or an index partition.
 

Package com.bigdata.btree Description

The BTree is a scalable B+-Tree with copy-on-write semantics mapping variable length unsigned byte[] keys to variable length byte[] values (null values are allowed). This class is designed for read-write operations.

The B+-Tree uses a fixed branching factor (aka fan-out) but supports variable length keys and values and does not directly constrain the serialized size of a node or leaf. The branching factor is determined when the B+Tree is created. Bulk index builds are supported from a variety of sources, including a merge of mutable and immutable B+-Trees. Bulk index builds result in immutable IndexSegments which may have a branching distinct from that of their source B+Tree(s).

Nodes (and leaves) are either dirty or immutable and persistent. If the node is dirty, then it is always mutable. If a node is clean, then it is always immutable and persistent. An attempt to write on a clean node forces copy-on-write of the node and its ancestors up to the root of the tree. In any case where the node has already been copied and is dirty, the mutable node is always used. Therefore mutations never overwrite the historical state of the B+-Tree and always produce a new well-formed tree. The root of the new tree is accessible from root block of the store after a commit (this is handled by the AbstractJournal.

Coding (Compression)

The com.bigdata.btree.INodeData and com.bigdata.btree.ILeafData interfaces represent the persistent state of a node or leaf. The keys of a node or leaf are represented by an IRaba, which is an abstraction for a logical byte[][]. Likewise, the values of a leaf are represented by an IRaba. For keys, the rabas support search. For values, they allow nulls. The node and leaf data records maintain additional persistent state, for example, leaf data records track tuple delete markers and tuple revision timestamps while node data records track the persistent address of child nodes.

Coding of node and leaf data records is performed when they are evicted from a write retention queue. The purpose of the write retention queue is to maintain recently used nodes and leaves in their mutable form and to defer eviction onto the disk until their data is stable (has not been changed recently). When a dirty node or leaf is evicted from the write retention queue, it is first coded (compressed) using a com.bigdata.btree.data.INodeDataCoder or a com.bigdata.btree.data.ILeafDataCoder as appropriate. Those coders in turn will apply the configured com.bigdata.btree.raba.IRabaCoder to code the keys and the values. Once the node or leaf has been coded, it is represented as a slice on a byte[]. That slice can be wrapped by the same com.bigdata.btree.raba.IRabaCoder, returning an efficient view of the compressed data record supporting search (for keys) and random access to the keys and values.

Front-coding (prefix compression) generally works quite well for keys. A canonical huffman coder may be used for keys, but is significantly slower and is generally used to code values. Custom coders may be written for either the keys or values of a B+Tree leaf in order to take advantage of various properties for a specific application index. However, the coder for the B+Tree node keys MUST handle variable length keys since the keys stored in the node are separator keys, not full length application keys.

Coding, decoding, and promotion to a mutable data record are handled transparently by the B+Tree. The NodeSerializer provides a facility for coding and decoding nodes and leaves. Coding uses a shared (per B+Tree instance) extensible byte[] buffer to reduce heap churn. Decoding simply wraps the coded record. Promotion to a mutable data record is handled by converting to a MutableNodeData or MutableLeafData respectively.

Persistence

When nodes or leaves are evicted from the write retention queue their coded data record is written onto the backing IRawStore. The resulting int64 address is updated on the parent of the node or leaf. Weak references are used between a child and its parent. The existence of the node or leaf on the write retention queue prevents weak references from being cleared. Once the node or leaf has been evicted from the write retention queue, it can be reloaded from its persistent address if its weak reference is cleared.

When a node is evicted from the write retention queue, a depth-first traversal of its dirty children is performed and they are persisted as well. This ensures that we can recover the tree structure later. When a leaf is evicted, just that leaf is persisted.

A B+-Tree checkpoint record corresponds to a persistent state of the B+Tree on the disk. The B+Tree may be reloaded from that checkpoint. After a checkpoint operation, all nodes and leaves in the B+Tree will be clean. However, a checkpoint IS NOT a commit. The commit protocol is handled by the AbstractJournal.

Transient B+-Tree Support

A BTree is designated as transient by specifying null for the backing IRawStore. Nodes and leaves of a transient B+-Tree are linked by hard references to ensure that they remain reachable. However, mutable nodes and leaves are still converted to coded (compressed) nodes and leaves when they are evicted from the write retention queue.



Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.