|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
public static interface IndexMetadata.Options
Options and their defaults for the com.bigdata.btree package and
the BTree and IndexSegment classes. Options that apply
equally to views and AbstractBTrees are in the package namespace,
such as whether or not a bloom filter is enabled. Options that apply to
all AbstractBTrees are specified within that namespace while
those that are specific to BTree or IndexSegment are
located within their respective class namespaces. Some properties, such
as the branchingFactor, are defined for both the BTree and the
IndexSegment because their defaults tend to be different when an
IndexSegment is generated from an BTree.
Note: The basic pattern here is using the class name, having a default instance of the class (or a factory for that instance), and then being able to override properties for that instance. Beans stuff really, just simpler., it should be possible to specify the overflow handler and its properties via options (as you can with beans or jini configurations)., it should be possible to specify a different split handler and its properties via options (as you can with beans or jini configurations).
| Field Summary | |
|---|---|
static String |
BLOOM_FILTER
Optional property controls whether or not a bloom filter is maintained (default "false"). |
static String |
BTREE_BRANCHING_FACTOR
The name of an optional property whose value specifies the branching factor for a mutable BTree. |
static String |
BTREE_CLASS_NAME
The name of a class derived from BTree that will be used to
re-load the index. |
static String |
BTREE_RECORD_COMPRESSOR_FACTORY
An optional factory providing record-level compression for the nodes and leaves of an IndexSegment (default
). |
static String |
CHILD_LOCKS
Option determines whether or not per-child locks are used by Node for a read-only AbstractBTree (default
"false"). |
static String |
DEFAULT_BLOOM_FILTER
|
static String |
DEFAULT_BTREE_BRANCHING_FACTOR
The default branching factor for a mutable BTree. |
static String |
DEFAULT_BTREE_RECORD_COMPRESSOR_FACTORY
|
static String |
DEFAULT_CHILD_LOCKS
|
static String |
DEFAULT_INDEX_SEGMENT_BRANCHING_FACTOR
The default branching factor for an IndexSegment. |
static String |
DEFAULT_INDEX_SEGMENT_BUFFER_NODES
|
static String |
DEFAULT_INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
|
static String |
DEFAULT_MASTER_CHUNK_SIZE
|
static String |
DEFAULT_MASTER_CHUNK_TIMEOUT_NANOS
|
static String |
DEFAULT_MASTER_QUEUE_CAPACITY
|
static String |
DEFAULT_SCATTER_SPLIT_DATA_SERVICE_COUNT
|
static String |
DEFAULT_SCATTER_SPLIT_ENABLED
|
static String |
DEFAULT_SCATTER_SPLIT_INDEX_PARTITION_COUNT
|
static String |
DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
|
static String |
DEFAULT_SINK_CHUNK_SIZE
|
static String |
DEFAULT_SINK_CHUNK_TIMEOUT_NANOS
|
static String |
DEFAULT_SINK_IDLE_TIMEOUT_NANOS
|
static String |
DEFAULT_SINK_POLL_TIMEOUT_NANOS
|
static String |
DEFAULT_SINK_QUEUE_CAPACITY
|
static String |
DEFAULT_SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
|
static String |
DEFAULT_SPLIT_HANDLER_MIN_ENTRY_COUNT
|
static String |
DEFAULT_SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
|
static String |
DEFAULT_SPLIT_HANDLER_SAMPLE_RATE
|
static String |
DEFAULT_SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
|
static String |
DEFAULT_WRITE_RETENTION_QUEUE_CAPACITY
|
static String |
DEFAULT_WRITE_RETENTION_QUEUE_SCAN
|
static String |
INDEX_SEGMENT_BRANCHING_FACTOR
The name of the property whose value specifies the branching factory for an immutable IndexSegment. |
static String |
INDEX_SEGMENT_BUFFER_NODES
When true an attempt will be made to fully buffer the
nodes (but not the leaves) of the IndexSegment (default
"false"). |
static String |
INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
An optional factory providing record-level compression for the nodes and leaves of an IndexSegment (default
). |
static String |
INITIAL_DATA_SERVICE
The name of an optional property whose value identifies the data service on which the initial index partition of a scale-out index will be created. |
static String |
KEY_BUILDER_FACTORY
Override the IKeyBuilderFactory used by the
DefaultTupleSerializer (the default is a
DefaultKeyBuilderFactory initialized with an empty
Properties object). |
static String |
LEAF_KEYS_CODER
Override the IRabaCoder used for the keys of leaves in
B+Trees (the default is a FrontCodedRabaCoder instance). |
static String |
LEAF_VALUES_CODER
Override the IRabaCoder used for the values of leaves in
B+Trees (default is a CanonicalHuffmanRabaCoder). |
static String |
MASTER_CHUNK_SIZE
The desired size of the chunks that the master will draw from its queue. |
static String |
MASTER_CHUNK_TIMEOUT_NANOS
The time in nanoseconds that the master will combine smaller chunks so that it can satisfy the desired masterChunkSize. |
static String |
MASTER_QUEUE_CAPACITY
The capacity of the queue on which the application writes. |
static int |
MAX_BTREE_BRANCHING_FACTOR
A reasonable maximum branching factor for a BTree. |
static int |
MAX_INDEX_SEGMENT_BRANCHING_FACTOR
A reasonable maximum branching factor for an IndexSegment. |
static int |
MAX_WRITE_RETENTION_QUEUE_CAPACITY
A reasonable maximum write retention queue capacity. |
static int |
MIN_BRANCHING_FACTOR
The minimum allowed branching factor (3). |
static int |
MIN_WRITE_RETENTION_QUEUE_CAPACITY
The minimum write retention queue capacity is two (2) in order to avoid cache evictions of the leaves participating in a split. |
static String |
NODE_KEYS_CODER
Override the IRabaCoder used for the keys in the nodes of a
B+Tree (the default is a FrontCodedRabaCoder instance). |
static String |
SCATTER_SPLIT_DATA_SERVICE_COUNT
The #of data services on which the index will be scattered or ZERO(0) to use all discovered data services (default "0"). |
static String |
SCATTER_SPLIT_ENABLED
Boolean option indicates whether or not scatter splits are performed (default ). |
static String |
SCATTER_SPLIT_INDEX_PARTITION_COUNT
The #of index partitions to generate when an index is scatter split. |
static String |
SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
The percentage of the nominal index partition size at which a scatter split is triggered when there is only a single index partition for a given scale-out index (default DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD). |
static String |
SINK_CHUNK_SIZE
The desired size of the chunks written that will be written by the sink. |
static String |
SINK_CHUNK_TIMEOUT_NANOS
The maximum amount of time in nanoseconds that a sink will combine smaller chunks so that it can satisfy the desired sinkChunkSize (default "9223372036854775807"). |
static String |
SINK_IDLE_TIMEOUT_NANOS
The time in nanoseconds after which an idle sink will be closed (default "9223372036854775807"). |
static String |
SINK_POLL_TIMEOUT_NANOS
The time in nanoseconds that the sink will
wait inside of the IAsynchronousIterator when it polls the
iterator for a chunk. |
static String |
SINK_QUEUE_CAPACITY
The capacity of the internal queue for the per-sink output buffer. |
static String |
SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
The target #of tuples for an index partition. |
static String |
SPLIT_HANDLER_MIN_ENTRY_COUNT
An index partition which has no more than this many tuples should be joined with its rightSibling (if any). |
static String |
SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
The index partition will be split when its actual entry count is GTE to overCapacityMultiplier * entryCountPerSplit |
static String |
SPLIT_HANDLER_SAMPLE_RATE
The #of samples to take per estimated split (non-negative, and generally on the order of 10s of samples). |
static String |
SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
When an index partition will be split, the #of new index partitions will be chosen such that each index partition is approximately underCapacityMultiplier full. |
static String |
WRITE_RETENTION_QUEUE_CAPACITY
The capacity of the hard reference queue used to retain recently touched nodes (nodes or leaves) and to defer the eviction of dirty nodes (nodes or leaves). |
static String |
WRITE_RETENTION_QUEUE_SCAN
The #of entries on the write retention queue that will be scanned for a match before a new reference is appended to the queue. |
| Field Detail |
|---|
static final int MIN_BRANCHING_FACTOR
static final int MAX_BTREE_BRANCHING_FACTOR
BTree.
static final int MAX_INDEX_SEGMENT_BRANCHING_FACTOR
IndexSegment.
static final int MIN_WRITE_RETENTION_QUEUE_CAPACITY
static final int MAX_WRITE_RETENTION_QUEUE_CAPACITY
static final String BLOOM_FILTER
AbstractBTrees. While the mutable
BTrees might occasionally grow to large to support a bloom
filter, data is periodically migrated onto immutable
IndexSegments which have perfect fit bloom filters. This
means that the bloom filter scales-out, but not up.
BloomFilterFactory.DEFAULT,
DEFAULT_BLOOM_FILTERstatic final String DEFAULT_BLOOM_FILTER
static final String INITIAL_DATA_SERVICE
UUID of that data
service (this is unambiguous) of the name associated with the data
service (it is up to the administrator to not assign the same name to
different data service instances and an arbitrary instance having the
desired name will be used if more than one instance is assigned the
same name). The default behavior is to select a data service using
the load balancer, which is done automatically by
IBigdataFederation.registerIndex(IndexMetadata, UUID) if
IndexMetadata.getInitialDataServiceUUID() returns
null.
static final String WRITE_RETENTION_QUEUE_CAPACITY
The purpose of this queue is to retain recently touched nodes and
leaves and to defer eviction of dirty nodes and leaves in case they
will be modified again soon. Once a node falls off the write
retention queue it is checked to see if it is dirty. If it is dirty,
then it is serialized and persisted on the backing store. If the
write retention queue capacity is set to a large value (say, GTE
1000), then that will will increase the commit latency and have a
negative effect on the overall performance. Too small a value will
mean that nodes that are undergoing mutation will be serialized and
persisted prematurely leading to excessive writes on the backing
store. For append-only stores, this directly contributes to what are
effectively redundant and thereafter unreachable copies of the
intermediate state of nodes as only nodes that can be reached by
navigation from a Checkpoint will ever be read again. The
value 500 appears to be a good default. While it is
possible that some workloads could benefit from a larger value, this
leads to higher commit latency and can therefore have a broad impact
on performance.
Note: The write retention queue is used for both BTree and
IndexSegment. Any touched node or leaf is placed onto this
queue. As nodes and leaves are evicted from this queue, they are then
placed onto the optional read-retention queue.
static final String WRITE_RETENTION_QUEUE_SCAN
AbstractNode.referenceCount
is zero and the node or leaf is dirty.
static final String DEFAULT_WRITE_RETENTION_QUEUE_CAPACITY
static final String DEFAULT_WRITE_RETENTION_QUEUE_SCAN
static final String KEY_BUILDER_FACTORY
IKeyBuilderFactory used by the
DefaultTupleSerializer (the default is a
DefaultKeyBuilderFactory initialized with an empty
Properties object).
FIXME KeyBuilder configuration support is not finished.
static final String NODE_KEYS_CODER
IRabaCoder used for the keys in the nodes of a
B+Tree (the default is a FrontCodedRabaCoder instance).
static final String LEAF_KEYS_CODER
IRabaCoder used for the keys of leaves in
B+Trees (the default is a FrontCodedRabaCoder instance).
DefaultTupleSerializer.setLeafKeysCoder(IRabaCoder)static final String LEAF_VALUES_CODER
IRabaCoder used for the values of leaves in
B+Trees (default is a CanonicalHuffmanRabaCoder).
DefaultTupleSerializer.setLeafValuesCoder(IRabaCoder)static final String CHILD_LOCKS
Node for a read-only AbstractBTree (default
"false"). This option effects synchronization
in Node.getChild(int). Synchronization is not required for
mutable BTrees as they already impose the constraint that the
caller is single threaded. Synchronization is required in this method
to ensure that the data structure remains coherent when concurrent
threads demand access to the same child of a given Node.
Per-child locks have higher potential concurrency since locking is
done on a distinct Object for each child rather than on a
shared Object for all children of a given Node.
However, per-child locks require more Object allocation (for
the locks) and thus contribute to heap demand.
Note: While this can improve read concurrency, this option imposes
additional RAM demands since there is on Object allocated for
each Node in the BTree. This is why it is turned off
by default.
static final String DEFAULT_CHILD_LOCKS
static final String BTREE_CLASS_NAME
BTree that will be used to
re-load the index. Note that index partitions are in general views
(of one or more resources). Therefore only unpartitioned indices can
be meaningfully specialized solely in terms of the BTree base
class.
IndexSegment and the FusedView. You might
also need to override the Checkpoint class - for
example the MetadataIndex does this.static final String BTREE_BRANCHING_FACTOR
BTree.
DEFAULT_BTREE_BRANCHING_FACTOR,
INDEX_SEGMENT_BRANCHING_FACTORstatic final String DEFAULT_BTREE_BRANCHING_FACTOR
BTree.
Note: on 9/11/2009 I changed the default B+Tree branching factor and
write retention queue capacity to 64 (was 32) and 8000 (was 500)
respectively. This change in the B+Tree branching factor reduces the
height of B+Trees on the Journal, increases the size of the
individual records on the disk, and aids performance substantially.
The larger write retention queue capacity helps to prevent B+Tree
nodes and leaves from being coded and flushed to disk too soon, which
decreases disk IO and keeps things in their mutable form in memory
longer, which improves search performance and keeps down the costs of
mutation operations. Systems with less RAM may need to reduce the
size of the LRUNexus global LRU to avoid
OutOfMemoryErrors. [Dropped back to 32/500 on 9/15/09 since
this does not do so well at scale on machines with less RAM.]
static final String BTREE_RECORD_COMPRESSOR_FACTORY
IndexSegment (default
).
FIXME Record level compression support is not finished.static final String DEFAULT_BTREE_RECORD_COMPRESSOR_FACTORY
BTREE_RECORD_COMPRESSOR_FACTORYstatic final String INDEX_SEGMENT_BRANCHING_FACTOR
IndexSegment.
static final String DEFAULT_INDEX_SEGMENT_BRANCHING_FACTOR
IndexSegment.
static final String INDEX_SEGMENT_BUFFER_NODES
true an attempt will be made to fully buffer the
nodes (but not the leaves) of the IndexSegment (default
"false"). The nodes in the
IndexSegment are serialized in a contiguous region by the
IndexSegmentBuilder. That region may be fully buffered when
the IndexSegment is opened, in which case queries against the
IndexSegment will incur NO disk hits for the nodes and only
one disk hit per visited leaf.
Note: The nodes are read into a buffer allocated from the
DirectBufferPool. If the size of the nodes region in the
IndexSegmentStore file exceeds the capacity of the buffers
managed by the DirectBufferPool, then the nodes WILL NOT be
buffered. The DirectBufferPool is used both for efficiency
and because a bug dealing with temporary direct buffers would
otherwise cause the C heap to be exhausted!
DEFAULT_INDEX_SEGMENT_BUFFER_NODESstatic final String DEFAULT_INDEX_SEGMENT_BUFFER_NODES
INDEX_SEGMENT_BUFFER_NODES,
Constant Field Valuesstatic final String INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
IndexSegment (default
).
FIXME Record level compression support is not finished.static final String DEFAULT_INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORYstatic final String SPLIT_HANDLER_MIN_ENTRY_COUNT
static final String SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
static final String SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
overCapacityMultiplier * entryCountPerSplit
static final String SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
static final String SPLIT_HANDLER_SAMPLE_RATE
static final String DEFAULT_SPLIT_HANDLER_MIN_ENTRY_COUNT
static final String DEFAULT_SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
static final String DEFAULT_SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
static final String DEFAULT_SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
static final String DEFAULT_SPLIT_HANDLER_SAMPLE_RATE
static final String MASTER_QUEUE_CAPACITY
AbstractTaskMaster, broken
into splits, and each split is written onto the
AbstractSubtask sink handling writes for the associated index
partition.
static final String DEFAULT_MASTER_QUEUE_CAPACITY
static final String MASTER_CHUNK_SIZE
static final String DEFAULT_MASTER_CHUNK_SIZE
static final String MASTER_CHUNK_TIMEOUT_NANOS
static final String DEFAULT_MASTER_CHUNK_TIMEOUT_NANOS
static final String SINK_POLL_TIMEOUT_NANOS
sink will
wait inside of the IAsynchronousIterator when it polls the
iterator for a chunk. This value should be relatively small so that
the sink remains responsible rather than blocking inside of the
IAsynchronousIterator for long periods of time.
static final String DEFAULT_SINK_POLL_TIMEOUT_NANOS
static final String SINK_QUEUE_CAPACITY
static final String DEFAULT_SINK_QUEUE_CAPACITY
static final String SINK_CHUNK_SIZE
sink.
static final String DEFAULT_SINK_CHUNK_SIZE
static final String SINK_CHUNK_TIMEOUT_NANOS
SINK_CHUNK_SIZE elements have accumulated before writing on
the index partition. This makes it much easier to adjust the
performance since you simply adjust the SINK_CHUNK_SIZE.
static final String DEFAULT_SINK_CHUNK_TIMEOUT_NANOS
static final String SINK_IDLE_TIMEOUT_NANOS
If the idle timeout is LT the SINK_CHUNK_TIMEOUT_NANOS then
a sink will remain open as long as new chunks appear and are combined
within idle timeout, otherwise the sink will decide that it is idle
and will flush its last chunk and close itself. If this is
Long.MAX_VALUE then the sink will identify itself as idle and
will only be closed if the master is closed or the sink has received
a StaleLocatorException for the index partition on which the
sink is writing.
static final String DEFAULT_SINK_IDLE_TIMEOUT_NANOS
static final String SCATTER_SPLIT_ENABLED
IDataServices in the federation. This
is normally very useful.
Sometimes a scatter split is not the "right" thing for an index. An example would be an index where you have to do a LOT of synchronous RPC rather than using asynchronous index writes. In this case, the synchronous RPC can be a bottleneck unless the "chunk" size of the writes is large. This is especially true when writes on other indices must wait for the outcome of the synchronous RPC. E.g., foreign keys.
OverflowManager.Options#SCATTER_SPLIT_ENABLEDstatic final String DEFAULT_SCATTER_SPLIT_ENABLED
static final String SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD). The
scatter split will break the index into multiple partitions and
distribute those index partitions across the federation in order to
allow more resources to be brought to bear on the scale-out index.
The value must LT the nominal index partition split point or normal
index splits will take precedence and a scatter split will never be
performed. The allowable range is therefore constrained to
(0.1 : 1.0).
static final String DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
static final String SCATTER_SPLIT_DATA_SERVICE_COUNT
static final String DEFAULT_SCATTER_SPLIT_DATA_SERVICE_COUNT
static final String SCATTER_SPLIT_INDEX_PARTITION_COUNT
SCATTER_SPLIT_DATA_SERVICE_COUNT discovered data services.
When ZERO(0), the scatter split will generate
(NDATA_SERVICES x 2) index partitions, where
NDATA_SERVICES is either SCATTER_SPLIT_DATA_SERVICE_COUNT or
the #of discovered data services when that option is ZERO (0).
The "ideal" number of index partitions is generally between (NCORES x NDATA_SERVICES / NINDICES) and (NCORES x NDATA_SERVICES). When there are NCORES x NDATA_SERVICES index partitions, each core is capable of servicing a distinct index partition assuming that the application and the "schema" are capable of driving the data service writes with that concurrency. However, if you have NINDICES, and the application drives writes on all index partitions of all indices at the same rate, then a 1:1 allocation of index partitions to cores would be "ideal".
The "right" answer also depends on the data scale. If you have far less data than can fill that many index partitions to 200M each, then you should adjust the scatter split to use fewer index partitions or fewer data services.
Finally, the higher the scatter the more you will need to use asynchronous index writes in order to obtain high throughput with sustained index writes.
static final String DEFAULT_SCATTER_SPLIT_INDEX_PARTITION_COUNT
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||