|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.btree.IndexSegmentBuilder
public class IndexSegmentBuilder
Builds an IndexSegment given a source btree and a target branching
factor. There are two main use cases:
BTree that is ideally backed by a fully
buffered IRawStore so that no random reads are required.TemporaryRawStore for this purpose.
This space demand was not present in West's algorithm because it did not
attempt to place the leaves contiguously onto the store.
Note: The use of up to three TemporaryRawStores per index segment
build can raise the demand on the DirectBufferPool, for example if a
number of concurrent index builds are occurring on a data service with a
large #of index partitions. One choice is to limit the parallelism of the
asynchronous overflow operation.
IndexSegment,
IndexSegmentFile,
IndexSegmentCheckpoint,
IndexSegmentMerger| Nested Class Summary | |
|---|---|
protected static class |
IndexSegmentBuilder.AbstractSimpleNodeData
Abstract base class for classes used to construct and serialize nodes and leaves written onto the index segment. |
protected static class |
IndexSegmentBuilder.NOPNodeFactory
Factory does not support node or leaf creation. |
protected static class |
IndexSegmentBuilder.SimpleLeafData
A class that can be used to (de-)serialize the data for a leaf without any of the logic for operations on the leaf. |
protected static class |
IndexSegmentBuilder.SimpleNodeData
A class that can be used to (de-)serialize the data for a node without any of the logic for operations on the node. |
| Field Summary | |
|---|---|
protected TemporaryRawStore |
blobBuffer
The optional buffer used to hold records referenced by index entries. |
long |
commitTime
The commit time associated with the view from which the IndexSegment is being generated (from the ctor). |
boolean |
compactingMerge
true iff the generated IndexSegment will
incorporate all state for the source index (partition) as of the
specified commitTime. |
long |
elapsed
The process runtime in milliseconds. |
long |
elapsed_build
The time to write the nodes and leaves into their respective buffers, not including the time to transfer those buffered onto the output file. |
long |
elapsed_setup
The time to setup the index build, including the generation of the index plan and the initialization of some helper objects. |
long |
elapsed_write
The time to write the nodes and leaves from their respective buffers onto the output file and synch and close that output file. |
int |
entryCount
The value specified to the ctor. |
protected static String |
ERR_NO_TUPLES
Warning message when the index segment will be empty. |
protected static String |
ERR_TOO_MANY_TUPLES
Error message when the #of tuples in the IndexSegment would
exceed Integer.MAX_VALUE. |
protected TemporaryRawStore |
leafBuffer
The buffer used to hold leaves so that they can be evicted en mass onto a region of the outFile. |
protected static org.apache.log4j.Logger |
log
Logger. |
float |
mbPerSec
The data throughput rate in megabytes per second. |
IndexMetadata |
metadata
A copy of the metadata object provided to the ctor. |
protected TemporaryRawStore |
nodeBuffer
The buffer used to hold nodes so that they can be evicted en mass onto a region of the outFile. |
protected RandomAccessFile |
out
The file on which the IndexSegment is written. |
File |
outFile
The file specified by the caller on which the IndexSegment is
written. |
IndexSegmentPlan |
plan
The plan for building the B+-Tree. |
UUID |
segmentUUID
The unique identifier for the generated IndexSegment resource. |
| Constructor Summary | |
|---|---|
IndexSegmentBuilder(File outFile,
File tmpDir,
int entryCount,
ITupleIterator entryIterator,
int m,
IndexMetadata metadata,
long commitTime,
boolean compactingMerge)
Designated constructor sets up a build of an IndexSegment for
some caller defined read-only view. |
|
| Method Summary | |
|---|---|
protected void |
addChild(IndexSegmentBuilder.SimpleNodeData parent,
long childAddr,
IndexSegmentBuilder.AbstractSimpleNodeData child)
Record the persistent address of a child on its parent and the #of entries spanned by that child. |
protected void |
addSeparatorKey(IndexSegmentBuilder.SimpleLeafData leaf)
Copies the first key of a new leaf as a separatorKey for the appropriate parent (if any) of that leaf. |
IndexSegmentCheckpoint |
call()
Build the IndexSegment given the parameters specified to
the ctor. |
protected void |
flush(IndexSegmentBuilder.AbstractSimpleNodeData node)
Flush a node or leaf that has been closed (no more data will be added). |
IndexSegmentCheckpoint |
getCheckpoint()
The IndexSegmentCheckpoint record written on the
IndexSegmentStore. |
protected IndexSegmentBuilder.SimpleNodeData |
getParent(IndexSegmentBuilder.AbstractSimpleNodeData node)
Return the parent of a node or leaf in the stack. |
IResourceMetadata |
getSegmentMetadata()
The description of the constructed IndexSegment resource. |
long |
getStartTime()
The timestamp in milliseconds when call() was invoked -or-
ZERO (0L) if call() has not been invoked. |
static IndexSegmentBuilder |
newInstance(String name,
ILocalBTreeView src,
File outFile,
File tmpDir,
boolean compactingMerge,
long createTime,
byte[] fromKey,
byte[] toKey)
Builder factory will build an IndexSegment from an index
(partition). |
protected IndexSegmentCheckpoint |
writeIndexSegment(FileChannel outChannel,
long commitTime)
Writes the complete file format for the index segment. |
protected long |
writeLeaf(IndexSegmentBuilder.SimpleLeafData leaf)
Serialize and write the leaf onto the leafBuffer. |
protected long |
writeNode(IndexSegmentBuilder.SimpleNodeData node)
Code and write the node onto the nodeBuffer. |
protected long |
writeNodeOrLeaf(IndexSegmentBuilder.AbstractSimpleNodeData node)
Serialize, compress, and write the node or leaf onto the appropriate output channel. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final org.apache.log4j.Logger log
protected static final String ERR_TOO_MANY_TUPLES
IndexSegment would
exceed Integer.MAX_VALUE.
Note: This is not an inherent limit in the IndexSegment but
rather a limit in the IndexSegmentPlan (and perhaps the
IndexSegmentBuilder) which presumes that the entry count is an
int rather than a long.
protected static final String ERR_NO_TUPLES
public final File outFile
IndexSegment is
written.
public final int entryCount
public final long commitTime
IndexSegment is being generated (from the ctor). This value is
written into IndexSegmentCheckpoint.commitTime.
public final boolean compactingMerge
true iff the generated IndexSegment will
incorporate all state for the source index (partition) as of the
specified commitTime.
Note: This flag is written into the IndexSegmentCheckpoint but it
has no other effect on the build process.
public final IndexMetadata metadata
IndexSegmentStore.
public final UUID segmentUUID
IndexSegment resource.
protected RandomAccessFile out
IndexSegment is written. The file is closed
regardless of the outcome of the operation.
protected TemporaryRawStore leafBuffer
outFile.
protected TemporaryRawStore nodeBuffer
outFile.
protected TemporaryRawStore blobBuffer
IndexMetadata MUST specify an
IOverflowHandler.
public final IndexSegmentPlan plan
public final long elapsed_setup
public long elapsed_build
public long elapsed_write
public long elapsed
public float mbPerSec
| Constructor Detail |
|---|
public IndexSegmentBuilder(File outFile,
File tmpDir,
int entryCount,
ITupleIterator entryIterator,
int m,
IndexMetadata metadata,
long commitTime,
boolean compactingMerge)
throws IOException
Designated constructor sets up a build of an IndexSegment for
some caller defined read-only view.
Note: The caller must determine whether or not deleted index entries are present in the view. The entryCount MUST be the exact #of index entries that are visited by the given iterator. In general, this is not difficult. However, if a compacting merge is desired (that is, if you are trying to generate a view containing only the non-deleted entries) then you MUST explicitly count the #of entries that will be visited by the iterator, e.g., it will require two passes over the iterator to setup the index build operation.
Note: With a branching factor of 4096 a tree of height 2 (three levels)
could address 68,719,476,736 entries - well beyond what we want in a
given index segment! Well before that the index segment should be split
into multiple files. The split point should be determined by the size of
the serialized leaves and nodes, e.g., the amount of data on disk
required by the index segment and the amount of memory required to fully
buffer the index nodes. While the size of a serialized node can be
estimated easily, the size of a serialized leaf depends on the kinds of
values stored in that index. The actual sizes are recorded in the
IndexSegmentCheckpoint record in the header of the
IndexSegment.
outFile - The file on which the index segment is written. The file MAY
exist but MUST have zero length if it does exist (this permits
you to use the temporary file facility to create the output
file).tmpDir - The temporary directory in data are buffered during the build
(optional - the default temporary directory is used if this is
null).entryCount - The #of entries that will be visited by the iterator.entryIterator - Visits the index entries in key order that will be written
onto the IndexSegment.m - The branching factor for the generated tree. This can be
chosen with an eye to minimizing the height of the generated
tree. (Small branching factors are permitted for testing, but
generally you want something relatively large.)metadata - The metadata record for the source index. A copy will be made
of this object. The branching factor in the generated tree
will be overridden to m.commitTime - The commit time associated with the view from which the
IndexSegment is being generated. This value is written
into IndexSegmentCheckpoint.commitTime.compactingMerge - true iff the generated IndexSegment
will incorporate all state for the source index (partition) as
of the specified commitTime. This flag is written into
the IndexSegmentCheckpoint but does not otherwise
effect the build process.
IOException| Method Detail |
|---|
public IndexSegmentCheckpoint getCheckpoint()
IndexSegmentCheckpoint record written on the
IndexSegmentStore.
public long getStartTime()
call() was invoked -or-
ZERO (0L) if call() has not been invoked.
public static IndexSegmentBuilder newInstance(String name,
ILocalBTreeView src,
File outFile,
File tmpDir,
boolean compactingMerge,
long createTime,
byte[] fromKey,
byte[] toKey)
throws IOException
IndexSegment from an index
(partition). Delete markers are propagated to the IndexSegment
unless compactingMerge is true.
name - The name of the index (for non-scale-out indices) or the name
of the index partition (for scale-out indices). DO NOT specify
the name of the scale-out index!src - A view of the index partition as of the createTime.
When compactingMerge is false then this
MUST be a single BTree since incremental builds are
only support for a BTree source while compacting
merges are defined for any IIndex.outFile - The file on which the IndexSegment will be written.
The file MAY exist, but if it exists then it MUST be empty.compactingMerge - When true the caller asserts that src
is a FusedView and deleted index entries WILL NOT be
included in the generated IndexSegment. Otherwise, it
is assumed that the only select component(s) of the index
partition view are being exported onto an IndexSegment
and deleted index entries will therefore be propagated to the
new IndexSegment (aka an incremental build).createTime - The commit time associated with the view from which the
IndexSegment is being generated. This value is written
into IndexSegmentCheckpoint.commitTime.fromKey - The lowest key that will be included (inclusive). When
null there is no lower bound.toKey - The first key that will be included (exclusive). When
null there is no upper bound.
IndexSegment.
IOException
public IndexSegmentCheckpoint call()
throws Exception
IndexSegment given the parameters specified to
the ctor.
call in interface Callable<IndexSegmentCheckpoint>Exception
protected void flush(IndexSegmentBuilder.AbstractSimpleNodeData node)
throws IOException
Flush a node or leaf that has been closed (no more data will be added).
Note: When a node or leaf is flushed we write it out to obtain its
address and set that address on its direct parent using
#addChild(SimpleNodeData, long). This also updates the per-child
counters of the #of entries spanned by a node.
IOException
protected void addChild(IndexSegmentBuilder.SimpleNodeData parent,
long childAddr,
IndexSegmentBuilder.AbstractSimpleNodeData child)
throws IOException
parent - The parent.childAddr - The address of the child (node or leaf).child - The child reference.
IOExceptionprotected void addSeparatorKey(IndexSegmentBuilder.SimpleLeafData leaf)
leaf - The current leaf. The first key on that leaf must be defined.protected IndexSegmentBuilder.SimpleNodeData getParent(IndexSegmentBuilder.AbstractSimpleNodeData node)
stack.
node - The node or leaf.
protected long writeNodeOrLeaf(IndexSegmentBuilder.AbstractSimpleNodeData node)
throws IOException
IOException
protected long writeLeaf(IndexSegmentBuilder.SimpleLeafData leaf)
throws IOException
leafBuffer.
Note: For leaf addresses we know the absolute offset into the
IndexSegmentStore where the leaf will wind up so we encode the
address of the leaf using the IndexSegmentRegion.BASE region.
Note: In order to write out the leaves using a double-linked list with
prior-/next-leaf addresses we have to use a "write behind" strategy.
Instead of writing out the leaf as soon as it is serialized, we save the
unencoded address and a copy of the serialized record on private member
fields. When we serialize the next leaf (or if we learn that we have no
more leaves to serialize because IndexSegmentPlan.nleaves EQ
nleavesWritten) then we patch the serialized representation of
the prior leaf and write it on the store at the previously obtained
address, thereby linking the leaves together in both directions. It is
definitely confusing.
IndexSegmentStore.
IOException
protected long writeNode(IndexSegmentBuilder.SimpleNodeData node)
throws IOException
nodeBuffer.
SimpleNodeData#addr.
IOExceptionIndexSegmentBuilder.AbstractSimpleNodeData,
IndexSegmentRegion,
IndexSegmentAddressManager
protected IndexSegmentCheckpoint writeIndexSegment(FileChannel outChannel,
long commitTime)
throws IOException
Writes the complete file format for the index segment. The file is divided up as follows:
The index segment metadata is divided into a base
IndexSegmentCheckpoint record with a fixed format containing only
essential data and additional metadata records written at the end of the
file including the optional bloom filter and the required
IndexMetadata record. The latter is where we write variable
length metadata including the _name_ of the index, or additional metadata
defined by a specific class of index.
Once all nodes and leaves have been buffered we are ready to start writing the data. We skip over a fixed size metadata record since otherwise we are unable to pre-compute the offset to the leaves and hence the addresses of the leaves. The node addresses are written in an encoding that requires active translation by the receiver who must be aware of the offset to the start of the node region. We can not write the metadata record until we know the size and length of each of these regions (leaves, nodes, and the bloom filter, or other metadata records) since that information is required in order to be able to form their addresses for insertion in the metadata record.
outChannel - commitTime -
IOExceptionpublic IResourceMetadata getSegmentMetadata()
IndexSegment resource.
IllegalStateException - if requested before the build operation is complete.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||