com.bigdata.btree
Interface IndexMetadata.Options

Enclosing class:
IndexMetadata

public static interface IndexMetadata.Options

Options and their defaults for the com.bigdata.btree package and the BTree and IndexSegment classes. Options that apply equally to views and AbstractBTrees are in the package namespace, such as whether or not a bloom filter is enabled. Options that apply to all AbstractBTrees are specified within that namespace while those that are specific to BTree or IndexSegment are located within their respective class namespaces. Some properties, such as the branchingFactor, are defined for both the BTree and the IndexSegment because their defaults tend to be different when an IndexSegment is generated from an BTree.

Version:
$Id: IndexMetadata.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
TODO:
It should be possible to specify the key, value, and node/leaf coders via this interface. This is easy enough if there is a standard factory interface, since we can specify the class name, and more difficult if we need to create an instance.

Note: The basic pattern here is using the class name, having a default instance of the class (or a factory for that instance), and then being able to override properties for that instance. Beans stuff really, just simpler., it should be possible to specify the overflow handler and its properties via options (as you can with beans or jini configurations)., it should be possible to specify a different split handler and its properties via options (as you can with beans or jini configurations).


Field Summary
static String BLOOM_FILTER
          Optional property controls whether or not a bloom filter is maintained (default "false").
static String BTREE_BRANCHING_FACTOR
          The name of an optional property whose value specifies the branching factor for a mutable BTree.
static String BTREE_CLASS_NAME
          The name of a class derived from BTree that will be used to re-load the index.
static String BTREE_RECORD_COMPRESSOR_FACTORY
          An optional factory providing record-level compression for the nodes and leaves of an IndexSegment (default ).
static String CHILD_LOCKS
          Option determines whether or not per-child locks are used by Node for a read-only AbstractBTree (default "false").
static String DEFAULT_BLOOM_FILTER
           
static String DEFAULT_BTREE_BRANCHING_FACTOR
          The default branching factor for a mutable BTree.
static String DEFAULT_BTREE_RECORD_COMPRESSOR_FACTORY
           
static String DEFAULT_CHILD_LOCKS
           
static String DEFAULT_INDEX_SEGMENT_BRANCHING_FACTOR
          The default branching factor for an IndexSegment.
static String DEFAULT_INDEX_SEGMENT_BUFFER_NODES
           
static String DEFAULT_INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
           
static String DEFAULT_MASTER_CHUNK_SIZE
           
static String DEFAULT_MASTER_CHUNK_TIMEOUT_NANOS
           
static String DEFAULT_MASTER_QUEUE_CAPACITY
           
static String DEFAULT_SCATTER_SPLIT_DATA_SERVICE_COUNT
           
static String DEFAULT_SCATTER_SPLIT_ENABLED
           
static String DEFAULT_SCATTER_SPLIT_INDEX_PARTITION_COUNT
           
static String DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
           
static String DEFAULT_SINK_CHUNK_SIZE
           
static String DEFAULT_SINK_CHUNK_TIMEOUT_NANOS
           
static String DEFAULT_SINK_IDLE_TIMEOUT_NANOS
           
static String DEFAULT_SINK_POLL_TIMEOUT_NANOS
           
static String DEFAULT_SINK_QUEUE_CAPACITY
           
static String DEFAULT_SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
           
static String DEFAULT_SPLIT_HANDLER_MIN_ENTRY_COUNT
           
static String DEFAULT_SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
           
static String DEFAULT_SPLIT_HANDLER_SAMPLE_RATE
           
static String DEFAULT_SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
           
static String DEFAULT_WRITE_RETENTION_QUEUE_CAPACITY
           
static String DEFAULT_WRITE_RETENTION_QUEUE_SCAN
           
static String INDEX_SEGMENT_BRANCHING_FACTOR
          The name of the property whose value specifies the branching factory for an immutable IndexSegment.
static String INDEX_SEGMENT_BUFFER_NODES
          When true an attempt will be made to fully buffer the nodes (but not the leaves) of the IndexSegment (default "false").
static String INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
          An optional factory providing record-level compression for the nodes and leaves of an IndexSegment (default ).
static String INITIAL_DATA_SERVICE
          The name of an optional property whose value identifies the data service on which the initial index partition of a scale-out index will be created.
static String KEY_BUILDER_FACTORY
          Override the IKeyBuilderFactory used by the DefaultTupleSerializer (the default is a DefaultKeyBuilderFactory initialized with an empty Properties object).
static String LEAF_KEYS_CODER
          Override the IRabaCoder used for the keys of leaves in B+Trees (the default is a FrontCodedRabaCoder instance).
static String LEAF_VALUES_CODER
          Override the IRabaCoder used for the values of leaves in B+Trees (default is a CanonicalHuffmanRabaCoder).
static String MASTER_CHUNK_SIZE
          The desired size of the chunks that the master will draw from its queue.
static String MASTER_CHUNK_TIMEOUT_NANOS
          The time in nanoseconds that the master will combine smaller chunks so that it can satisfy the desired masterChunkSize.
static String MASTER_QUEUE_CAPACITY
          The capacity of the queue on which the application writes.
static int MAX_BTREE_BRANCHING_FACTOR
          A reasonable maximum branching factor for a BTree.
static int MAX_INDEX_SEGMENT_BRANCHING_FACTOR
          A reasonable maximum branching factor for an IndexSegment.
static int MAX_WRITE_RETENTION_QUEUE_CAPACITY
          A reasonable maximum write retention queue capacity.
static int MIN_BRANCHING_FACTOR
          The minimum allowed branching factor (3).
static int MIN_WRITE_RETENTION_QUEUE_CAPACITY
          The minimum write retention queue capacity is two (2) in order to avoid cache evictions of the leaves participating in a split.
static String NODE_KEYS_CODER
          Override the IRabaCoder used for the keys in the nodes of a B+Tree (the default is a FrontCodedRabaCoder instance).
static String SCATTER_SPLIT_DATA_SERVICE_COUNT
          The #of data services on which the index will be scattered or ZERO(0) to use all discovered data services (default "0").
static String SCATTER_SPLIT_ENABLED
          Boolean option indicates whether or not scatter splits are performed (default ).
static String SCATTER_SPLIT_INDEX_PARTITION_COUNT
          The #of index partitions to generate when an index is scatter split.
static String SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
          The percentage of the nominal index partition size at which a scatter split is triggered when there is only a single index partition for a given scale-out index (default DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD).
static String SINK_CHUNK_SIZE
          The desired size of the chunks written that will be written by the sink.
static String SINK_CHUNK_TIMEOUT_NANOS
          The maximum amount of time in nanoseconds that a sink will combine smaller chunks so that it can satisfy the desired sinkChunkSize (default "9223372036854775807").
static String SINK_IDLE_TIMEOUT_NANOS
          The time in nanoseconds after which an idle sink will be closed (default "9223372036854775807").
static String SINK_POLL_TIMEOUT_NANOS
          The time in nanoseconds that the sink will wait inside of the IAsynchronousIterator when it polls the iterator for a chunk.
static String SINK_QUEUE_CAPACITY
          The capacity of the internal queue for the per-sink output buffer.
static String SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
          The target #of tuples for an index partition.
static String SPLIT_HANDLER_MIN_ENTRY_COUNT
          An index partition which has no more than this many tuples should be joined with its rightSibling (if any).
static String SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
          The index partition will be split when its actual entry count is GTE to overCapacityMultiplier * entryCountPerSplit
static String SPLIT_HANDLER_SAMPLE_RATE
          The #of samples to take per estimated split (non-negative, and generally on the order of 10s of samples).
static String SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
          When an index partition will be split, the #of new index partitions will be chosen such that each index partition is approximately underCapacityMultiplier full.
static String WRITE_RETENTION_QUEUE_CAPACITY
          The capacity of the hard reference queue used to retain recently touched nodes (nodes or leaves) and to defer the eviction of dirty nodes (nodes or leaves).
static String WRITE_RETENTION_QUEUE_SCAN
          The #of entries on the write retention queue that will be scanned for a match before a new reference is appended to the queue.
 

Field Detail

MIN_BRANCHING_FACTOR

static final int MIN_BRANCHING_FACTOR
The minimum allowed branching factor (3). The branching factor may be odd or even.

See Also:
Constant Field Values

MAX_BTREE_BRANCHING_FACTOR

static final int MAX_BTREE_BRANCHING_FACTOR
A reasonable maximum branching factor for a BTree.

See Also:
Constant Field Values

MAX_INDEX_SEGMENT_BRANCHING_FACTOR

static final int MAX_INDEX_SEGMENT_BRANCHING_FACTOR
A reasonable maximum branching factor for an IndexSegment.

See Also:
Constant Field Values

MIN_WRITE_RETENTION_QUEUE_CAPACITY

static final int MIN_WRITE_RETENTION_QUEUE_CAPACITY
The minimum write retention queue capacity is two (2) in order to avoid cache evictions of the leaves participating in a split.

See Also:
Constant Field Values

MAX_WRITE_RETENTION_QUEUE_CAPACITY

static final int MAX_WRITE_RETENTION_QUEUE_CAPACITY
A reasonable maximum write retention queue capacity.

See Also:
Constant Field Values

BLOOM_FILTER

static final String BLOOM_FILTER
Optional property controls whether or not a bloom filter is maintained (default "false"). When enabled, the bloom filter is effective up to ~ 2M entries per index (partition). For scale-up, the bloom filter is automatically disabled after its error rate would be too large given the #of index enties. For scale-out, as the index grows we keep splitting it into more and more index partitions, and those index partitions are comprised of both views of one or more AbstractBTrees. While the mutable BTrees might occasionally grow to large to support a bloom filter, data is periodically migrated onto immutable IndexSegments which have perfect fit bloom filters. This means that the bloom filter scales-out, but not up.

See Also:
BloomFilterFactory.DEFAULT, DEFAULT_BLOOM_FILTER

DEFAULT_BLOOM_FILTER

static final String DEFAULT_BLOOM_FILTER
See Also:
Constant Field Values

INITIAL_DATA_SERVICE

static final String INITIAL_DATA_SERVICE
The name of an optional property whose value identifies the data service on which the initial index partition of a scale-out index will be created. The value may be the UUID of that data service (this is unambiguous) of the name associated with the data service (it is up to the administrator to not assign the same name to different data service instances and an arbitrary instance having the desired name will be used if more than one instance is assigned the same name). The default behavior is to select a data service using the load balancer, which is done automatically by IBigdataFederation.registerIndex(IndexMetadata, UUID) if IndexMetadata.getInitialDataServiceUUID() returns null.


WRITE_RETENTION_QUEUE_CAPACITY

static final String WRITE_RETENTION_QUEUE_CAPACITY
The capacity of the hard reference queue used to retain recently touched nodes (nodes or leaves) and to defer the eviction of dirty nodes (nodes or leaves).

The purpose of this queue is to retain recently touched nodes and leaves and to defer eviction of dirty nodes and leaves in case they will be modified again soon. Once a node falls off the write retention queue it is checked to see if it is dirty. If it is dirty, then it is serialized and persisted on the backing store. If the write retention queue capacity is set to a large value (say, GTE 1000), then that will will increase the commit latency and have a negative effect on the overall performance. Too small a value will mean that nodes that are undergoing mutation will be serialized and persisted prematurely leading to excessive writes on the backing store. For append-only stores, this directly contributes to what are effectively redundant and thereafter unreachable copies of the intermediate state of nodes as only nodes that can be reached by navigation from a Checkpoint will ever be read again. The value 500 appears to be a good default. While it is possible that some workloads could benefit from a larger value, this leads to higher commit latency and can therefore have a broad impact on performance.

Note: The write retention queue is used for both BTree and IndexSegment. Any touched node or leaf is placed onto this queue. As nodes and leaves are evicted from this queue, they are then placed onto the optional read-retention queue.


WRITE_RETENTION_QUEUE_SCAN

static final String WRITE_RETENTION_QUEUE_SCAN
The #of entries on the write retention queue that will be scanned for a match before a new reference is appended to the queue. This trades off the cost of scanning entries on the queue, which is handled by the queue itself, against the cost of queue churn. Note that queue eviction drives IOs required to write the leaves on the store, but incremental writes occur iff the AbstractNode.referenceCount is zero and the node or leaf is dirty.


DEFAULT_WRITE_RETENTION_QUEUE_CAPACITY

static final String DEFAULT_WRITE_RETENTION_QUEUE_CAPACITY
See Also:
Constant Field Values

DEFAULT_WRITE_RETENTION_QUEUE_SCAN

static final String DEFAULT_WRITE_RETENTION_QUEUE_SCAN
See Also:
Constant Field Values

KEY_BUILDER_FACTORY

static final String KEY_BUILDER_FACTORY
Override the IKeyBuilderFactory used by the DefaultTupleSerializer (the default is a DefaultKeyBuilderFactory initialized with an empty Properties object). FIXME KeyBuilder configuration support is not finished.


NODE_KEYS_CODER

static final String NODE_KEYS_CODER
Override the IRabaCoder used for the keys in the nodes of a B+Tree (the default is a FrontCodedRabaCoder instance).


LEAF_KEYS_CODER

static final String LEAF_KEYS_CODER
Override the IRabaCoder used for the keys of leaves in B+Trees (the default is a FrontCodedRabaCoder instance).

See Also:
DefaultTupleSerializer.setLeafKeysCoder(IRabaCoder)

LEAF_VALUES_CODER

static final String LEAF_VALUES_CODER
Override the IRabaCoder used for the values of leaves in B+Trees (default is a CanonicalHuffmanRabaCoder).

See Also:
DefaultTupleSerializer.setLeafValuesCoder(IRabaCoder)

CHILD_LOCKS

static final String CHILD_LOCKS
Option determines whether or not per-child locks are used by Node for a read-only AbstractBTree (default "false"). This option effects synchronization in Node.getChild(int). Synchronization is not required for mutable BTrees as they already impose the constraint that the caller is single threaded. Synchronization is required in this method to ensure that the data structure remains coherent when concurrent threads demand access to the same child of a given Node. Per-child locks have higher potential concurrency since locking is done on a distinct Object for each child rather than on a shared Object for all children of a given Node. However, per-child locks require more Object allocation (for the locks) and thus contribute to heap demand.

Note: While this can improve read concurrency, this option imposes additional RAM demands since there is on Object allocated for each Node in the BTree. This is why it is turned off by default.


DEFAULT_CHILD_LOCKS

static final String DEFAULT_CHILD_LOCKS
See Also:
Constant Field Values

BTREE_CLASS_NAME

static final String BTREE_CLASS_NAME
The name of a class derived from BTree that will be used to re-load the index. Note that index partitions are in general views (of one or more resources). Therefore only unpartitioned indices can be meaningfully specialized solely in terms of the BTree base class.

TODO:
in order to provide a similar specialization mechanism for scale-out indices you would need to specify the class name for the IndexSegment and the FusedView. You might also need to override the Checkpoint class - for example the MetadataIndex does this.

BTREE_BRANCHING_FACTOR

static final String BTREE_BRANCHING_FACTOR
The name of an optional property whose value specifies the branching factor for a mutable BTree.

See Also:
DEFAULT_BTREE_BRANCHING_FACTOR, INDEX_SEGMENT_BRANCHING_FACTOR

DEFAULT_BTREE_BRANCHING_FACTOR

static final String DEFAULT_BTREE_BRANCHING_FACTOR
The default branching factor for a mutable BTree.

Note: on 9/11/2009 I changed the default B+Tree branching factor and write retention queue capacity to 64 (was 32) and 8000 (was 500) respectively. This change in the B+Tree branching factor reduces the height of B+Trees on the Journal, increases the size of the individual records on the disk, and aids performance substantially. The larger write retention queue capacity helps to prevent B+Tree nodes and leaves from being coded and flushed to disk too soon, which decreases disk IO and keeps things in their mutable form in memory longer, which improves search performance and keeps down the costs of mutation operations. Systems with less RAM may need to reduce the size of the LRUNexus global LRU to avoid OutOfMemoryErrors. [Dropped back to 32/500 on 9/15/09 since this does not do so well at scale on machines with less RAM.]

See Also:
Constant Field Values

BTREE_RECORD_COMPRESSOR_FACTORY

static final String BTREE_RECORD_COMPRESSOR_FACTORY
An optional factory providing record-level compression for the nodes and leaves of an IndexSegment (default ).

See Also:
FIXME Record level compression support is not finished.

DEFAULT_BTREE_RECORD_COMPRESSOR_FACTORY

static final String DEFAULT_BTREE_RECORD_COMPRESSOR_FACTORY
See Also:
BTREE_RECORD_COMPRESSOR_FACTORY

INDEX_SEGMENT_BRANCHING_FACTOR

static final String INDEX_SEGMENT_BRANCHING_FACTOR
The name of the property whose value specifies the branching factory for an immutable IndexSegment.


DEFAULT_INDEX_SEGMENT_BRANCHING_FACTOR

static final String DEFAULT_INDEX_SEGMENT_BRANCHING_FACTOR
The default branching factor for an IndexSegment.

See Also:
Constant Field Values

INDEX_SEGMENT_BUFFER_NODES

static final String INDEX_SEGMENT_BUFFER_NODES
When true an attempt will be made to fully buffer the nodes (but not the leaves) of the IndexSegment (default "false"). The nodes in the IndexSegment are serialized in a contiguous region by the IndexSegmentBuilder. That region may be fully buffered when the IndexSegment is opened, in which case queries against the IndexSegment will incur NO disk hits for the nodes and only one disk hit per visited leaf.

Note: The nodes are read into a buffer allocated from the DirectBufferPool. If the size of the nodes region in the IndexSegmentStore file exceeds the capacity of the buffers managed by the DirectBufferPool, then the nodes WILL NOT be buffered. The DirectBufferPool is used both for efficiency and because a bug dealing with temporary direct buffers would otherwise cause the C heap to be exhausted!

See Also:
DEFAULT_INDEX_SEGMENT_BUFFER_NODES
TODO:
should be on by default? (but verify that the unit tests do not run out of memory when it is enabled by default).

DEFAULT_INDEX_SEGMENT_BUFFER_NODES

static final String DEFAULT_INDEX_SEGMENT_BUFFER_NODES
See Also:
INDEX_SEGMENT_BUFFER_NODES, Constant Field Values

INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY

static final String INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
An optional factory providing record-level compression for the nodes and leaves of an IndexSegment (default ).

See Also:
FIXME Record level compression support is not finished.

DEFAULT_INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY

static final String DEFAULT_INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
See Also:
INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY

SPLIT_HANDLER_MIN_ENTRY_COUNT

static final String SPLIT_HANDLER_MIN_ENTRY_COUNT
An index partition which has no more than this many tuples should be joined with its rightSibling (if any).


SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT

static final String SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
The target #of tuples for an index partition.


SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER

static final String SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
The index partition will be split when its actual entry count is GTE to overCapacityMultiplier * entryCountPerSplit


SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER

static final String SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
When an index partition will be split, the #of new index partitions will be chosen such that each index partition is approximately underCapacityMultiplier full.


SPLIT_HANDLER_SAMPLE_RATE

static final String SPLIT_HANDLER_SAMPLE_RATE
The #of samples to take per estimated split (non-negative, and generally on the order of 10s of samples). The purpose of the samples is to accommodate the actual distribution of the keys in the index.


DEFAULT_SPLIT_HANDLER_MIN_ENTRY_COUNT

static final String DEFAULT_SPLIT_HANDLER_MIN_ENTRY_COUNT
See Also:
Constant Field Values

DEFAULT_SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT

static final String DEFAULT_SPLIT_HANDLER_ENTRY_COUNT_PER_SPLIT
See Also:
Constant Field Values

DEFAULT_SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER

static final String DEFAULT_SPLIT_HANDLER_OVER_CAPACITY_MULTIPLIER
See Also:
Constant Field Values

DEFAULT_SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER

static final String DEFAULT_SPLIT_HANDLER_UNDER_CAPACITY_MULTIPLIER
See Also:
Constant Field Values

DEFAULT_SPLIT_HANDLER_SAMPLE_RATE

static final String DEFAULT_SPLIT_HANDLER_SAMPLE_RATE
See Also:
Constant Field Values

MASTER_QUEUE_CAPACITY

static final String MASTER_QUEUE_CAPACITY
The capacity of the queue on which the application writes. Chunks are drained from this queue by the AbstractTaskMaster, broken into splits, and each split is written onto the AbstractSubtask sink handling writes for the associated index partition.


DEFAULT_MASTER_QUEUE_CAPACITY

static final String DEFAULT_MASTER_QUEUE_CAPACITY
See Also:
Constant Field Values

MASTER_CHUNK_SIZE

static final String MASTER_CHUNK_SIZE
The desired size of the chunks that the master will draw from its queue.


DEFAULT_MASTER_CHUNK_SIZE

static final String DEFAULT_MASTER_CHUNK_SIZE
See Also:
Constant Field Values

MASTER_CHUNK_TIMEOUT_NANOS

static final String MASTER_CHUNK_TIMEOUT_NANOS
The time in nanoseconds that the master will combine smaller chunks so that it can satisfy the desired masterChunkSize.


DEFAULT_MASTER_CHUNK_TIMEOUT_NANOS

static final String DEFAULT_MASTER_CHUNK_TIMEOUT_NANOS

SINK_POLL_TIMEOUT_NANOS

static final String SINK_POLL_TIMEOUT_NANOS
The time in nanoseconds that the sink will wait inside of the IAsynchronousIterator when it polls the iterator for a chunk. This value should be relatively small so that the sink remains responsible rather than blocking inside of the IAsynchronousIterator for long periods of time.


DEFAULT_SINK_POLL_TIMEOUT_NANOS

static final String DEFAULT_SINK_POLL_TIMEOUT_NANOS

SINK_QUEUE_CAPACITY

static final String SINK_QUEUE_CAPACITY
The capacity of the internal queue for the per-sink output buffer.


DEFAULT_SINK_QUEUE_CAPACITY

static final String DEFAULT_SINK_QUEUE_CAPACITY
See Also:
Constant Field Values

SINK_CHUNK_SIZE

static final String SINK_CHUNK_SIZE
The desired size of the chunks written that will be written by the sink.


DEFAULT_SINK_CHUNK_SIZE

static final String DEFAULT_SINK_CHUNK_SIZE
See Also:
Constant Field Values

SINK_CHUNK_TIMEOUT_NANOS

static final String SINK_CHUNK_TIMEOUT_NANOS
The maximum amount of time in nanoseconds that a sink will combine smaller chunks so that it can satisfy the desired sinkChunkSize (default "9223372036854775807"). The default is an infinite timeout. This means that the sink will simply wait until SINK_CHUNK_SIZE elements have accumulated before writing on the index partition. This makes it much easier to adjust the performance since you simply adjust the SINK_CHUNK_SIZE.


DEFAULT_SINK_CHUNK_TIMEOUT_NANOS

static final String DEFAULT_SINK_CHUNK_TIMEOUT_NANOS
See Also:
Constant Field Values

SINK_IDLE_TIMEOUT_NANOS

static final String SINK_IDLE_TIMEOUT_NANOS
The time in nanoseconds after which an idle sink will be closed (default "9223372036854775807"). Any buffered writes are flushed when the sink is closed. The idle timeout is reset (a) if a chunk is available to be drained by the sink; or (b) if a chunk is drained from the sink. If no chunks become available the the sink will eventually decide that it is idle, will flush any buffered writes, and will close itself.

If the idle timeout is LT the SINK_CHUNK_TIMEOUT_NANOS then a sink will remain open as long as new chunks appear and are combined within idle timeout, otherwise the sink will decide that it is idle and will flush its last chunk and close itself. If this is Long.MAX_VALUE then the sink will identify itself as idle and will only be closed if the master is closed or the sink has received a StaleLocatorException for the index partition on which the sink is writing.


DEFAULT_SINK_IDLE_TIMEOUT_NANOS

static final String DEFAULT_SINK_IDLE_TIMEOUT_NANOS
See Also:
Constant Field Values

SCATTER_SPLIT_ENABLED

static final String SCATTER_SPLIT_ENABLED
Boolean option indicates whether or not scatter splits are performed (default ). Scatter splits only apply for scale-out indices where they "scatter" the initial index partition across the IDataServices in the federation. This is normally very useful.

Sometimes a scatter split is not the "right" thing for an index. An example would be an index where you have to do a LOT of synchronous RPC rather than using asynchronous index writes. In this case, the synchronous RPC can be a bottleneck unless the "chunk" size of the writes is large. This is especially true when writes on other indices must wait for the outcome of the synchronous RPC. E.g., foreign keys.

See Also:
OverflowManager.Options#SCATTER_SPLIT_ENABLED

DEFAULT_SCATTER_SPLIT_ENABLED

static final String DEFAULT_SCATTER_SPLIT_ENABLED
See Also:
Constant Field Values

SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD

static final String SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
The percentage of the nominal index partition size at which a scatter split is triggered when there is only a single index partition for a given scale-out index (default DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD). The scatter split will break the index into multiple partitions and distribute those index partitions across the federation in order to allow more resources to be brought to bear on the scale-out index. The value must LT the nominal index partition split point or normal index splits will take precedence and a scatter split will never be performed. The allowable range is therefore constrained to (0.1 : 1.0).


DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD

static final String DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
See Also:
Constant Field Values

SCATTER_SPLIT_DATA_SERVICE_COUNT

static final String SCATTER_SPLIT_DATA_SERVICE_COUNT
The #of data services on which the index will be scattered or ZERO(0) to use all discovered data services (default "0").


DEFAULT_SCATTER_SPLIT_DATA_SERVICE_COUNT

static final String DEFAULT_SCATTER_SPLIT_DATA_SERVICE_COUNT
See Also:
Constant Field Values

SCATTER_SPLIT_INDEX_PARTITION_COUNT

static final String SCATTER_SPLIT_INDEX_PARTITION_COUNT
The #of index partitions to generate when an index is scatter split. The index partitions will be evenly distributed across up to SCATTER_SPLIT_DATA_SERVICE_COUNT discovered data services. When ZERO(0), the scatter split will generate (NDATA_SERVICES x 2) index partitions, where NDATA_SERVICES is either SCATTER_SPLIT_DATA_SERVICE_COUNT or the #of discovered data services when that option is ZERO (0).

The "ideal" number of index partitions is generally between (NCORES x NDATA_SERVICES / NINDICES) and (NCORES x NDATA_SERVICES). When there are NCORES x NDATA_SERVICES index partitions, each core is capable of servicing a distinct index partition assuming that the application and the "schema" are capable of driving the data service writes with that concurrency. However, if you have NINDICES, and the application drives writes on all index partitions of all indices at the same rate, then a 1:1 allocation of index partitions to cores would be "ideal".

The "right" answer also depends on the data scale. If you have far less data than can fill that many index partitions to 200M each, then you should adjust the scatter split to use fewer index partitions or fewer data services.

Finally, the higher the scatter the more you will need to use asynchronous index writes in order to obtain high throughput with sustained index writes.


DEFAULT_SCATTER_SPLIT_INDEX_PARTITION_COUNT

static final String DEFAULT_SCATTER_SPLIT_INDEX_PARTITION_COUNT
See Also:
Constant Field Values


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.