com.bigdata.rdf.store
Interface AbstractTripleStore.Options

All Superinterfaces:
AbstractResource.Options, DataLoader.Options, FullTextIndex.Options, InferenceEngine.Options, KeyBuilder.Options, Options
All Known Subinterfaces:
BigdataSail.Options, LocalTripleStore.Options, TempTripleStore.Options
Enclosing class:
AbstractTripleStore

public static interface AbstractTripleStore.Options
extends AbstractResource.Options, InferenceEngine.Options, Options, KeyBuilder.Options, DataLoader.Options, FullTextIndex.Options

Configuration options.

Version:
$Id: AbstractTripleStore.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
TODO:
refactor options to/from SPORelation and LexiconRelation?

Field Summary
static String AXIOMS_CLASS
          The Axioms model that will be used (default ).
static String BLOOM_FILTER
          Optional property controls whether or not a bloom filter is maintained for the SPO statement index.
static String CLOSURE_CLASS
          The name of the BaseClosure class that will be used (default ).
static String DEFAULT_AXIOMS_CLASS
           
static String DEFAULT_BLOOM_FILTER
           
static String DEFAULT_CLOSURE_CLASS
           
static String DEFAULT_JUSTIFY
           
static String DEFAULT_LEXICON
           
static String DEFAULT_ONE_ACCESS_PATH
           
static String DEFAULT_QUADS
           
static String DEFAULT_STATEMENT_IDENTIFIERS
           
static String DEFAULT_STORE_BLANK_NODES
           
static String DEFAULT_TERMID_BITS_TO_REVERSE
           
static String DEFAULT_TEXT_INDEX
           
static String DEFAULT_TEXT_INDEX_DATATYPE_LITERALS
           
static String DEFAULT_VOCABULARY_CLASS
           
static String JUSTIFY
          When true (default ), proof chains for entailments generated by forward chaining are stored in the database.
static String LEXICON
          Boolean option (default true) enables support for the lexicon (the forward and backward term indices).
static String ONE_ACCESS_PATH
          Boolean option (default false) disables all but a single statement index (aka access path).
static String QUADS
          Boolean option determines whether the KB instance will be a quad store or a triple store.
static String STATEMENT_IDENTIFIERS
          Boolean option (default "false") enables support for statement identifiers.
static String STORE_BLANK_NODES
          Boolean option (default "false") controls whether or not we store blank nodes in the forward mapping of the lexicon.
static String TERMID_BITS_TO_REVERSE
          Option effects how evenly distributed the assigned term identifiers which has a pronounced effect on the ID2TERM and statement indices for scale-out deployments.
static String TEXT_INDEX
          Boolean option (default true) enables support for a full text index that may be used to lookup literals by tokens found in the text of those literals.
static String TEXT_INDEX_DATATYPE_LITERALS
          Boolean option (default false) enables support for a full text index that may be used to lookup datatype literals by tokens found in the text of those literals.
static String VOCABULARY_CLASS
          The name of the class that will establish the pre-defined Vocabulary for the database (default ).
 
Fields inherited from interface com.bigdata.relation.AbstractResource.Options
CHUNK_CAPACITY, CHUNK_OF_CHUNKS_CAPACITY, CHUNK_TIMEOUT, DEFAULT_CHUNK_CAPACITY, DEFAULT_CHUNK_OF_CHUNKS_CAPACITY, DEFAULT_CHUNK_TIMEOUT, DEFAULT_FORCE_SERIAL_EXECUTION, DEFAULT_FULLY_BUFFERED_READ_THRESHOLD, DEFAULT_MAX_PARALLEL_SUBQUERIES, FORCE_SERIAL_EXECUTION, FULLY_BUFFERED_READ_THRESHOLD, MAX_PARALLEL_SUBQUERIES, NESTED_SUBQUERY
 
Fields inherited from interface com.bigdata.rdf.rules.InferenceEngine.Options
DEFAULT_FORWARD_CHAIN_OWL_EQUIVALENT_CLASS, DEFAULT_FORWARD_CHAIN_OWL_EQUIVALENT_PROPERTY, DEFAULT_FORWARD_CHAIN_OWL_HAS_VALUE, DEFAULT_FORWARD_CHAIN_OWL_INVERSE_OF, DEFAULT_FORWARD_CHAIN_OWL_SAMEAS_CLOSURE, DEFAULT_FORWARD_CHAIN_OWL_SAMEAS_PROPERTIES, DEFAULT_FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, DEFAULT_FORWARD_RDF_TYPE_RDFS_RESOURCE, FORWARD_CHAIN_OWL_EQUIVALENT_CLASS, FORWARD_CHAIN_OWL_EQUIVALENT_PROPERTY, FORWARD_CHAIN_OWL_HAS_VALUE, FORWARD_CHAIN_OWL_INVERSE_OF, FORWARD_CHAIN_OWL_SAMEAS_CLOSURE, FORWARD_CHAIN_OWL_SAMEAS_PROPERTIES, FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, FORWARD_CHAIN_RDF_TYPE_RDFS_RESOURCE
 
Fields inherited from interface com.bigdata.journal.Options
ALTERNATE_ROOT_BLOCK, BUFFER_MODE, CREATE, CREATE_TEMP_FILE, CREATE_TIME, DEFAULT_BUFFER_MODE, DEFAULT_CREATE, DEFAULT_CREATE_TEMP_FILE, DEFAULT_DELETE_ON_CLOSE, DEFAULT_DELETE_ON_EXIT, DEFAULT_DOUBLE_SYNC, DEFAULT_FILE_LOCK_ENABLED, DEFAULT_FORCE_ON_COMMIT, DEFAULT_FORCE_WRITES, DEFAULT_HISTORICAL_INDEX_CACHE_CAPACITY, DEFAULT_HISTORICAL_INDEX_CACHE_TIMEOUT, DEFAULT_INITIAL_EXTENT, DEFAULT_LIVE_INDEX_CACHE_CAPACITY, DEFAULT_LIVE_INDEX_CACHE_TIMEOUT, DEFAULT_MAXIMUM_EXTENT, DEFAULT_READ_CACHE_CAPACITY, DEFAULT_READ_CACHE_MAX_RECORD_SIZE, DEFAULT_READ_ONLY, DEFAULT_USE_DIRECT_BUFFERS, DEFAULT_VALIDATE_CHECKSUM, DEFAULT_WRITE_CACHE_CAPACITY, DELETE_ON_CLOSE, DELETE_ON_EXIT, DOUBLE_SYNC, FILE, FILE_LOCK_ENABLED, FORCE_ON_COMMIT, FORCE_WRITES, HISTORICAL_INDEX_CACHE_CAPACITY, HISTORICAL_INDEX_CACHE_TIMEOUT, INITIAL_EXTENT, JNL, LIVE_INDEX_CACHE_CAPACITY, LIVE_INDEX_CACHE_TIMEOUT, MAXIMUM_EXTENT, minimumInitialExtent, minimumWriteCacheCapacity, OFFSET_BITS, READ_CACHE_CAPACITY, READ_CACHE_MAX_RECORD_SIZE, READ_ONLY, SEG, TMP_DIR, USE_DIRECT_BUFFERS, VALIDATE_CHECKSUM, WRITE_CACHE_CAPACITY
 
Fields inherited from interface com.bigdata.btree.keys.KeyBuilder.Options
COLLATOR, DECOMPOSITION, STRENGTH, USER_COUNTRY, USER_LANGUAGE, USER_VARIANT
 
Fields inherited from interface com.bigdata.rdf.store.DataLoader.Options
BUFFER_CAPACITY, CLOSURE, COMMIT, DEFAULT_BUFFER_CAPACITY, DEFAULT_CLOSURE, DEFAULT_COMMIT, DEFAULT_FLUSH, DEFAULT_VERIFY_DATA, FLUSH, VERIFY_DATA
 
Fields inherited from interface com.bigdata.search.FullTextIndex.Options
DEFAULT_INDEXER_COLLATOR_STRENGTH, DEFAULT_INDEXER_TIMEOUT, DEFAULT_OVERWRITE, INDEXER_COLLATOR_STRENGTH, INDEXER_TIMEOUT, OVERWRITE
 

Field Detail

LEXICON

static final String LEXICON
Boolean option (default true) enables support for the lexicon (the forward and backward term indices). When false, the lexicon indices are not registered. This can be safely turned off for the TempTripleStore when only the statement indices are to be used.

See Also:
LexiconRelation

DEFAULT_LEXICON

static final String DEFAULT_LEXICON
See Also:
Constant Field Values

STORE_BLANK_NODES

static final String STORE_BLANK_NODES
Boolean option (default "false") controls whether or not we store blank nodes in the forward mapping of the lexicon.

When false blank node semantics are enforced, you CAN NOT unify blank nodes based on their IDs in the lexicon, and AbstractTripleStore.getBNodeCount() is disabled.

When true, you are able to violate blank node semantics and force unification of blank nodes by assigning the ID from the RDF interchange syntax to the blank node. RIO has an option that will allow you to do this. When this option is also true, then you will in fact be able to resolve pre-existing blank nodes using their identifiers. The tradeoff is time and space : if you have a LOT of document using blank nodes then you might want to disable this option in order to spend less time writing the forward lexicon index (and it will also take up less space).


DEFAULT_STORE_BLANK_NODES

static final String DEFAULT_STORE_BLANK_NODES
See Also:
Constant Field Values

TERMID_BITS_TO_REVERSE

static final String TERMID_BITS_TO_REVERSE
Option effects how evenly distributed the assigned term identifiers which has a pronounced effect on the ID2TERM and statement indices for scale-out deployments. The default for a scale-out deployment is "6", but the default for a scale-up deployment is ZERO(0).

For the scale-out triple store, the term identifiers are formed by placing the index partition identifier in the high word and the local counter for the index partition into the low word. In addition, the sign bit is "stolen" from each value such that the low two bits are left open for bit flags which encode the type (URI, Literal, BNode or SID) of the term. The effect of this option is to cause the low N bits of the local counter value to be reversed and written into the high N bits of the term identifier (the other bits are shifted down to make room for this). Regardless of the configured value for this option, all bits (except the sign bit) of the both the partition identifier and the local counter are preserved.

Normally, the low bits of a sequential counter will vary the most rapidly. By reversing the localCounter and placing some of the reversed bits into the high bits of the term identifier we cause the term identifiers to be uniformly (but not randomly) distributed. This is much like using hash function without collisions or a random number generator that does not produce duplicates. When ZERO (0) no bits are reversed so the high bits of the term identifiers directly reflect the partition identifier and the low bits are assigned sequentially by the local counter within each TERM2ID index partition.

The use of a non-zero value for this option can easily cause the write load on the index partitions for the ID2TERM and statement indices to be perfectly balanced. However, using too many bits has some negative consequences on locality of operations within an index partition (since the distribution of the keys be approximately uniform distribution, leading to poor cache performance, more copy-on-write for the B+Tree, and both more IO and faster growth in the journal for writes (since there will be more leaves made dirty on average by each bulk write)).

The use of a non-zero value for this option also directly effects the degree of scatter for bulk read or write operations. As more bits are used, it becomes increasingly likely that each bulk read or write operation will on average touch all index partitions. This is because #of low order local counter bits reversed and rotated into the high bits of the term identifier places an approximate bound on the #of index partitions of the ID2TERM or a statement index that will be touched by a scattered read or write. However, that number will continue to grow slowly over time as new partition identifiers are introduced (the partition identifiers appear next in the encoded term identifier and therefore determine the degree of locality or scatter once the quickly varying high bits have had their say).

The "right" value really depends on the expected scale of the knowledge base. If you estimate that you will have 50 x 200M index partitions for the statement indices, then SQRT(50) =~ 7 would be a good choice.


DEFAULT_TERMID_BITS_TO_REVERSE

static final String DEFAULT_TERMID_BITS_TO_REVERSE
See Also:
Constant Field Values

TEXT_INDEX

static final String TEXT_INDEX
Boolean option (default true) enables support for a full text index that may be used to lookup literals by tokens found in the text of those literals.


DEFAULT_TEXT_INDEX

static final String DEFAULT_TEXT_INDEX
See Also:
Constant Field Values

TEXT_INDEX_DATATYPE_LITERALS

static final String TEXT_INDEX_DATATYPE_LITERALS
Boolean option (default false) enables support for a full text index that may be used to lookup datatype literals by tokens found in the text of those literals.


DEFAULT_TEXT_INDEX_DATATYPE_LITERALS

static final String DEFAULT_TEXT_INDEX_DATATYPE_LITERALS
See Also:
Constant Field Values

VOCABULARY_CLASS

static final String VOCABULARY_CLASS
The name of the class that will establish the pre-defined Vocabulary for the database (default ). The class MUST extend BaseVocabulary. This option is ignored if the lexicon is disabled.

The Vocabulary is initialized by AbstractTripleStore.create() and its serialized state is stored in the global row store under the TripleStoreSchema.VOCABULARY property.

See Also:
NoVocabulary, RDFSVocabulary

DEFAULT_VOCABULARY_CLASS

static final String DEFAULT_VOCABULARY_CLASS

AXIOMS_CLASS

static final String AXIOMS_CLASS
The Axioms model that will be used (default ). The value is the name of the class that will be instantiated by AbstractTripleStore.create(). The class must extend BaseAxioms. This option is ignored if the lexicon is disabled. Use NoAxioms to disable inference.


DEFAULT_AXIOMS_CLASS

static final String DEFAULT_AXIOMS_CLASS

CLOSURE_CLASS

static final String CLOSURE_CLASS
The name of the BaseClosure class that will be used (default ). The value is the name of the class that will be used to generate the Program that computes the closure of the database. The class must extend BaseClosure. This option is ignored if the inference is disabled.

There are two pre-defined "programs" used to compute and maintain closure. The FullClosure program is a simple fix point of the RDFS+ entailments, except for the foo rdf:type rdfs:Resource entailments which are normally generated at query time. The FastClosure program breaks nearly all cycles in the RDFS rules and runs nearly entirely as a sequence of IRules, including several custom rules.

It is far easier to modify the FullClosure program since any new rules can just be dropped into place. Modifying the FastClosure program requires careful consideration of the entailments computed at each stage in order to determine where a new rule would fit in.

Note: When support for owl:sameAs, etc. processing is enabled, some of the entailments are computed by rules run during forward closure and some of the entailments are computed by rules run at query time. Both FastClosure and FullClosure are aware of this and handle it correctly (e.g., as configured).


DEFAULT_CLOSURE_CLASS

static final String DEFAULT_CLOSURE_CLASS

ONE_ACCESS_PATH

static final String ONE_ACCESS_PATH
Boolean option (default false) disables all but a single statement index (aka access path).

Note: The main purpose of the option is to make it possible to turn off the other access paths for special bulk load purposes. The use of this option is NOT compatible with either the application of the InferenceEngine or high-level query.

Note: You may want to explicitly enable or disable the bloom filter for this. Normally a single access path (SPO) is used for a temporary store. Temporary stores tend to be smaller, so if you will also be doing point tests on the temporary store then you probably want to use the BLOOM_FILTER. Otherwise it may be turned off to realize some (minimal) performance gain.


DEFAULT_ONE_ACCESS_PATH

static final String DEFAULT_ONE_ACCESS_PATH
See Also:
Constant Field Values

BLOOM_FILTER

static final String BLOOM_FILTER
Optional property controls whether or not a bloom filter is maintained for the SPO statement index. The bloom filter is effective up to ~ 2M entries per index (partition). For scale-up, the bloom filter is automatically disabled after its error rate would be too large given the #of index entries. For scale-out, as the index grows we keep splitting it into more and more index partitions, and those index partitions are comprised of both views of one or more AbstractBTrees. While the mutable BTrees might occasionally grow too large to support a bloom filter, data is periodically migrated onto immutable IndexSegments which have perfect fit bloom filters. This means that the bloom filter scales-out, but not up.

Note: The SPO access path is used any time we have an access path that corresponds to a point test. Therefore this is the only index for which it makes sense to maintain a bloom filter.

If you are going to do a lot of small commits, then please DO NOT enable the bloom filter for the AbstractTripleStore. The bloom filter takes 1 MB each time you commit on the SPO/SPOC index. The bloom filter limited value in any case for scale-up since its nominal error rate will be exceeded at ~2M triples. This concern does not apply for scale-out, where the bloom filter is always a good idea.

See Also:
IndexMetadata.getBloomFilterFactory()
TODO:
Review the various temp triple stores that are created and see which of them would benefit from the SPO bloom filter (TM, backchainers, SIDs fixed point, etc).

DEFAULT_BLOOM_FILTER

static final String DEFAULT_BLOOM_FILTER
See Also:
Constant Field Values

JUSTIFY

static final String JUSTIFY
When true (default ), proof chains for entailments generated by forward chaining are stored in the database. This option is required for truth maintenance when retracting assertion.

If you will not be retracting statements from the database then you can specify false for a significant performance boost during writes and a smaller profile on the disk.

This option does not effect query performance since the justifications are maintained in a distinct index and are only used when retracting assertions.


DEFAULT_JUSTIFY

static final String DEFAULT_JUSTIFY
See Also:
Constant Field Values

STATEMENT_IDENTIFIERS

static final String STATEMENT_IDENTIFIERS
Boolean option (default "false") enables support for statement identifiers. A statement identifier is unique identifier for a triple in the database. Statement identifiers may be used to make statements about statements without using RDF style reification.

Statement identifiers are assigned consistently when Statement s are mapped into the database. This is done using an extension of the term:id index to map the statement as if it were a term onto a unique statement identifier. While the statement identifier is assigned canonically by the term:id index, it is stored redundantly in the value position for each of the statement indices. While the statement identifier is, in fact, a term identifier, the reverse mapping is NOT stored in the id:term index and you CAN NOT translate from a statement identifier back to the original statement.

bigdata supports an RDF/XML interchange extension for the interchange of triples with statement identifiers that may be used as blank nodes to make statements about statements. See BNS and RDFXMLParser.

Statement identifiers add some latency when loading data since it increases the size of the writes on the terms index (and also its space requirements since all statements are also replicated in the terms index). However, if you are doing concurrent data load then the added latency is nicely offset by the parallelism.

The main benefit for statement identifiers is that they provide a mechanism for statement level provenance. This is critical for some applications.

An alternative approach to provenance within RDF is to use the concatenation of the subject, predicate, and object (or a hash of their concatenation) as the value in the context position. While this approach can be used with any quad store, it is less transparent and requires twice the amount of data on the disk since you need an additional three statement indices to cover the quad access paths.

The provenance mode (SIDs) IS NOT compatible with the QUADS mode. You may use either one, but not both in the same KB instance.

There are examples for using the provenance mode online.


DEFAULT_STATEMENT_IDENTIFIERS

static final String DEFAULT_STATEMENT_IDENTIFIERS
See Also:
Constant Field Values

QUADS

static final String QUADS
Boolean option determines whether the KB instance will be a quad store or a triple store. For a triple store only, the STATEMENT_IDENTIFIERS option determines whether or not the provenance mode is enabled.


DEFAULT_QUADS

static final String DEFAULT_QUADS
See Also:
Constant Field Values


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.