com.bigdata.rdf.store
Interface AbstractTripleStore.Options

All Superinterfaces:
AbstractResource.Options, DataLoader.Options, FullTextIndex.Options, InferenceEngine.Options, KeyBuilder.Options, Options, RDFParserOptions.Options
All Known Subinterfaces:
BigdataSail.Options, LocalTripleStore.Options, TempTripleStore.Options
Enclosing class:
AbstractTripleStore

public static interface AbstractTripleStore.Options
extends AbstractResource.Options, InferenceEngine.Options, Options, KeyBuilder.Options, DataLoader.Options, FullTextIndex.Options

Configuration options.

Version:
$Id: AbstractTripleStore.java 6312 2012-05-03 09:06:37Z mrpersonick $
Author:
Bryan Thompson

Field Summary
static String AXIOMS_CLASS
          The Axioms model that will be used (default ).
static String BLOBS_THRESHOLD
          The threshold (in character length) at which an RDF Value will be inserted into the LexiconKeyOrder.BLOBS index rather than the LexiconKeyOrder.TERM2ID and LexiconKeyOrder.ID2TERM indices (default "256").
static String BLOOM_FILTER
          Optional property controls whether or not a bloom filter is maintained for the SPO statement index.
static String CLOSURE_CLASS
          The name of the BaseClosure class that will be used (default ).
static String CONSTRAIN_XXXC_SHARDS
          Boolean option determines whether or not an XXXCShardSplitHandler is applied (scale-out only, default "true").
static String DEFAULT_AXIOMS_CLASS
           
static String DEFAULT_BLOBS_THRESHOLD
           
static String DEFAULT_BLOOM_FILTER
           
static String DEFAULT_CLOSURE_CLASS
           
static String DEFAULT_CONSTRAIN_XXXC_SHARDS
           
static String DEFAULT_EXTENSION_FACTORY_CLASS
           
static String DEFAULT_INLINE_BNODES
           
static String DEFAULT_INLINE_DATE_TIMES
           
static String DEFAULT_INLINE_DATE_TIMES_TIMEZONE
           
static String DEFAULT_INLINE_TEXT_LITERALS
           
static String DEFAULT_INLINE_XSD_DATATYPE_LITERALS
           
static String DEFAULT_JUSTIFY
           
static String DEFAULT_LEXICON
           
static String DEFAULT_MAX_INLINE_STRING_LENGTH
          Note that there an interaction when this is enabled with the full text indexer.
static String DEFAULT_ONE_ACCESS_PATH
           
static String DEFAULT_QUADS
           
static String DEFAULT_QUADS_MODE
           
static String DEFAULT_REJECT_INVALID_XSD_VALUES
           
static String DEFAULT_STATEMENT_IDENTIFIERS
           
static String DEFAULT_STORE_BLANK_NODES
           
static String DEFAULT_SUBJECT_CENTRIC_TEXT_INDEX
           
static String DEFAULT_SUBJECT_CENTRIC_TEXT_INDEXER_CLASS
           
static String DEFAULT_TERM_CACHE_CAPACITY
           
static String DEFAULT_TERMID_BITS_TO_REVERSE
           
static String DEFAULT_TEXT_INDEX
           
static String DEFAULT_TEXT_INDEX_DATATYPE_LITERALS
           
static String DEFAULT_TEXT_INDEXER_CLASS
           
static String DEFAULT_TRIPLES_MODE
           
static String DEFAULT_TRIPLES_MODE_WITH_PROVENANCE
           
static String DEFAULT_VALUE_FACTORY_CLASS
           
static String DEFAULT_VOCABULARY_CLASS
          Note: The default Vocabulary class may be changed from time to time as additional VocabularyDecl are created and bundled into a new default Vocabulary.
static String EXTENSION_FACTORY_CLASS
          The name of the IExtensionFactory class.
static String INLINE_BNODES
          Set up database to inline bnodes directly into the statement indices rather than using the lexicon to map them to term identifiers and back.
static String INLINE_DATE_TIMES
          Set up database to inline date/times directly into the statement indices rather than using the lexicon to map them to term identifiers and back (default "true").
static String INLINE_DATE_TIMES_TIMEZONE
          The default time zone to be used to a) encode inline xsd:datetime literals that do not have a time zone specified and b) decode xsd:datetime literals from the statement indices where they are stored as UTC milliseconds since the epoch (default "GMT").
static String INLINE_TEXT_LITERALS
          Inline ANY literal having fewer than MAX_INLINE_TEXT_LENGTH characters (default "false").
static String INLINE_XSD_DATATYPE_LITERALS
          Set up database to inline XSD datatype literals corresponding to primitives (boolean) and numerics (byte, short, int, etc) directly into the statement indices (default "true").
static String JUSTIFY
          When true (default ), proof chains for entailments generated by forward chaining are stored in the database.
static String LEXICON
          Boolean option (default true) enables support for the lexicon (the forward and backward term indices).
static String MAX_INLINE_TEXT_LENGTH
          The maximum length of a String value which may be inlined into the statement indices (default "0" ).
static String ONE_ACCESS_PATH
          Boolean option (default false) disables all but a single statement index (aka access path).
static String QUADS
          Boolean option determines whether the KB instance will be a quad store or a triple store.
static String QUADS_MODE
          Set up database in quads mode.
static String REJECT_INVALID_XSD_VALUES
          When true AND is true, literals having an xsd datatype URI which can not be validated against that datatype will be rejected (default DEFAULT_REJECT_INVALID_XSD_VALUES).
static String STATEMENT_IDENTIFIERS
          Boolean option (default "false") enables support for statement identifiers.
static String STORE_BLANK_NODES
          Boolean option (default "false") controls whether or not we store blank nodes in the forward mapping of the lexicon (this is also known as the "told bnodes" mode).
static String SUBJECT_CENTRIC_TEXT_INDEX
          Boolean option (default true) enables support for a full text index that may be used to lookup literals by tokens found in the text of those literals.
static String SUBJECT_CENTRIC_TEXT_INDEXER_CLASS
          The name of the ITextIndexer class.
static String TERM_CACHE_CAPACITY
          Integer option whose value is the capacity of the term cache.
static String TERMID_BITS_TO_REVERSE
          Option effects how evenly distributed the assigned term identifiers which has a pronounced effect on the ID2TERM and statement indices for scale-out deployments.
static String TEXT_INDEX
          Boolean option (default "true") enables support for a full text index that may be used to lookup literals by tokens found in the text of those literals.
static String TEXT_INDEX_DATATYPE_LITERALS
          Boolean option enables support for a full text index that may be used to lookup datatype literals by tokens found in the text of those literals (default "true").
static String TEXT_INDEXER_CLASS
          The name of the IValueCentricTextIndexer class.
static String TRIPLES_MODE
          Set up database in triples mode, no provenance.
static String TRIPLES_MODE_WITH_PROVENANCE
          Set up database in triples mode with provenance.
static String VALUE_FACTORY_CLASS
          The name of the BigdataValueFactory class.
static String VOCABULARY_CLASS
          The name of the class that will establish the pre-defined Vocabulary for the database (default ).
 
Fields inherited from interface com.bigdata.relation.AbstractResource.Options
CHUNK_CAPACITY, CHUNK_OF_CHUNKS_CAPACITY, CHUNK_TIMEOUT, DEFAULT_CHUNK_CAPACITY, DEFAULT_CHUNK_OF_CHUNKS_CAPACITY, DEFAULT_CHUNK_TIMEOUT, DEFAULT_FORCE_SERIAL_EXECUTION, DEFAULT_FULLY_BUFFERED_READ_THRESHOLD, DEFAULT_MAX_PARALLEL_SUBQUERIES, FORCE_SERIAL_EXECUTION, FULLY_BUFFERED_READ_THRESHOLD, MAX_PARALLEL_SUBQUERIES
 
Fields inherited from interface com.bigdata.rdf.rules.InferenceEngine.Options
DEFAULT_FORWARD_CHAIN_OWL_EQUIVALENT_CLASS, DEFAULT_FORWARD_CHAIN_OWL_EQUIVALENT_PROPERTY, DEFAULT_FORWARD_CHAIN_OWL_HAS_VALUE, DEFAULT_FORWARD_CHAIN_OWL_INVERSE_OF, DEFAULT_FORWARD_CHAIN_OWL_SAMEAS_CLOSURE, DEFAULT_FORWARD_CHAIN_OWL_SAMEAS_PROPERTIES, DEFAULT_FORWARD_CHAIN_OWL_SYMMETRIC_PROPERTY, DEFAULT_FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, DEFAULT_FORWARD_RDF_TYPE_RDFS_RESOURCE, FORWARD_CHAIN_OWL_EQUIVALENT_CLASS, FORWARD_CHAIN_OWL_EQUIVALENT_PROPERTY, FORWARD_CHAIN_OWL_HAS_VALUE, FORWARD_CHAIN_OWL_INVERSE_OF, FORWARD_CHAIN_OWL_SAMEAS_CLOSURE, FORWARD_CHAIN_OWL_SAMEAS_PROPERTIES, FORWARD_CHAIN_OWL_SYMMETRIC_PROPERTY, FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, FORWARD_CHAIN_RDF_TYPE_RDFS_RESOURCE
 
Fields inherited from interface com.bigdata.journal.Options
ALTERNATE_ROOT_BLOCK, BUFFER_MODE, CREATE, CREATE_TEMP_FILE, CREATE_TIME, DEFAULT_BUFFER_MODE, DEFAULT_CREATE, DEFAULT_CREATE_TEMP_FILE, DEFAULT_DELETE_ON_CLOSE, DEFAULT_DELETE_ON_EXIT, DEFAULT_DOUBLE_SYNC, DEFAULT_FILE_LOCK_ENABLED, DEFAULT_FORCE_ON_COMMIT, DEFAULT_FORCE_WRITES, DEFAULT_HISTORICAL_INDEX_CACHE_CAPACITY, DEFAULT_HISTORICAL_INDEX_CACHE_TIMEOUT, DEFAULT_INITIAL_EXTENT, DEFAULT_LIVE_INDEX_CACHE_CAPACITY, DEFAULT_LIVE_INDEX_CACHE_TIMEOUT, DEFAULT_MAXIMUM_EXTENT, DEFAULT_MINIMUM_EXTENSION, DEFAULT_READ_CACHE_CAPACITY, DEFAULT_READ_CACHE_MAX_RECORD_SIZE, DEFAULT_READ_ONLY, DEFAULT_USE_DIRECT_BUFFERS, DEFAULT_VALIDATE_CHECKSUM, DEFAULT_WRITE_CACHE_BUFFER_COUNT, DEFAULT_WRITE_CACHE_ENABLED, DELETE_ON_CLOSE, DELETE_ON_EXIT, DOUBLE_SYNC, FILE, FILE_LOCK_ENABLED, FORCE_ON_COMMIT, FORCE_WRITES, HISTORICAL_INDEX_CACHE_CAPACITY, HISTORICAL_INDEX_CACHE_TIMEOUT, IGNORE_BAD_ROOT_BLOCK, INITIAL_EXTENT, JNL, LIVE_INDEX_CACHE_CAPACITY, LIVE_INDEX_CACHE_TIMEOUT, MAXIMUM_EXTENT, MEM_MAX_EXTENT, MINIMUM_EXTENSION, minimumInitialExtent, minimumMinimumExtension, OFFSET_BITS, OTHER_MAX_EXTENT, READ_ONLY, RW_MAX_EXTENT, SEG, TMP_DIR, UPDATE_ICU_VERSION, USE_DIRECT_BUFFERS, VALIDATE_CHECKSUM, WRITE_CACHE_BUFFER_COUNT, WRITE_CACHE_ENABLED
 
Fields inherited from interface com.bigdata.btree.keys.KeyBuilder.Options
COLLATOR, DECOMPOSITION, STRENGTH, USER_COUNTRY, USER_LANGUAGE, USER_VARIANT
 
Fields inherited from interface com.bigdata.rdf.store.DataLoader.Options
BUFFER_CAPACITY, CLOSURE, COMMIT, DEFAULT_BUFFER_CAPACITY, DEFAULT_CLOSURE, DEFAULT_COMMIT, DEFAULT_FLUSH, FLUSH
 
Fields inherited from interface com.bigdata.rdf.rio.RDFParserOptions.Options
DATATYPE_HANDLING, DEFAULT_DATATYPE_HANDLING, DEFAULT_PRESERVE_BNODE_IDS, DEFAULT_STOP_AT_FIRST_ERROR, DEFAULT_VERIFY_DATA, PRESERVE_BNODE_IDS, STOP_AT_FIRST_ERROR, VERIFY_DATA
 
Fields inherited from interface com.bigdata.search.FullTextIndex.Options
ANALYZER_FACTORY_CLASS, DEFAULT_ANALYZER_FACTORY_CLASS, DEFAULT_FIELDS_ENABLED, DEFAULT_HIT_CACHE_SIZE, DEFAULT_HIT_CACHE_TIMEOUT_MILLIS, DEFAULT_INDEXER_COLLATOR_STRENGTH, DEFAULT_INDEXER_TIMEOUT, DEFAULT_OVERWRITE, FIELDS_ENABLED, HIT_CACHE_SIZE, HIT_CACHE_TIMEOUT_MILLIS, INDEXER_COLLATOR_STRENGTH, INDEXER_TIMEOUT, OVERWRITE
 

Field Detail

LEXICON

static final String LEXICON
Boolean option (default true) enables support for the lexicon (the forward and backward term indices). When false, the lexicon indices are not registered. This can be safely turned off for the TempTripleStore when only the statement indices are to be used.

You can control how the triple store will interpret the RDF URIs, and literals using the KeyBuilder.Options. For example:

 // Force ASCII key comparisons.
 properties.setProperty(Options.COLLATOR, CollatorEnum.ASCII.toString());
 
or
 // Force identical unicode comparisons (assuming default COLLATOR setting).
 properties.setProperty(Options.STRENGTH, StrengthEnum.IDENTICAL.toString());
 

See Also:
LexiconRelation, KeyBuilder.Options

DEFAULT_LEXICON

static final String DEFAULT_LEXICON
See Also:
Constant Field Values

STORE_BLANK_NODES

static final String STORE_BLANK_NODES
Boolean option (default "false") controls whether or not we store blank nodes in the forward mapping of the lexicon (this is also known as the "told bnodes" mode).

When false blank node semantics are enforced, you CAN NOT unify blank nodes based on their IDs in the lexicon, and AbstractTripleStore.getBNodeCount() is disabled.

When true, you are able to violate blank node semantics and force unification of blank nodes by assigning the ID from the RDF interchange syntax to the blank node. RIO has an option that will allow you to do this. When this option is also true, then you will in fact be able to resolve pre-existing blank nodes using their identifiers. The tradeoff is time and space : if you have a LOT of document using blank nodes then you might want to disable this option in order to spend less time writing the forward lexicon index (and it will also take up less space).


DEFAULT_STORE_BLANK_NODES

static final String DEFAULT_STORE_BLANK_NODES
See Also:
Constant Field Values

TERMID_BITS_TO_REVERSE

static final String TERMID_BITS_TO_REVERSE
Option effects how evenly distributed the assigned term identifiers which has a pronounced effect on the ID2TERM and statement indices for scale-out deployments. The default for a scale-out deployment is "6". This option is ignored for a standalone deployment.

For the scale-out triple store, the term identifiers are formed by placing the index partition identifier in the high word and the local counter for the index partition into the low word. The effect of this option is to cause the low N bits of the local counter value to be reversed and written into the high N bits of the term identifier (the other bits are shifted down to make room for this). Regardless of the configured value for this option, all bits of the both the partition identifier and the local counter are preserved.

Normally, the low bits of a sequential counter will vary the most rapidly. By reversing the localCounter and placing some of the reversed bits into the high bits of the term identifier we cause the term identifiers to be uniformly (but not randomly) distributed. This is much like using hash function without collisions or a random number generator that does not produce duplicates. When the value of this option is ZERO (0), no bits are reversed so the high bits of the term identifiers directly reflect the partition identifier and the low bits are assigned sequentially by the local counter within each TERM2ID index partition.

The use of a non-zero value for this option can easily cause the write load on the index partitions for the ID2TERM and statement indices to be perfectly balanced. However, using too many bits has some negative consequences on locality of operations within an index partition (since the distribution of the keys be approximately uniform distribution, leading to poor cache performance, more copy-on-write for the B+Tree, and both more IO and faster growth in the journal for writes (since there will be more leaves made dirty on average by each bulk write)).

The use of a non-zero value for this option also directly effects the degree of scatter for bulk read or write operations. As more bits are used, it becomes increasingly likely that each bulk read or write operation will on average touch all index partitions. This is because #of low order local counter bits reversed and rotated into the high bits of the term identifier places an approximate bound on the #of index partitions of the ID2TERM or a statement index that will be touched by a scattered read or write. However, that number will continue to grow slowly over time as new partition identifiers are introduced (the partition identifiers appear next in the encoded term identifier and therefore determine the degree of locality or scatter once the quickly varying high bits have had their say).

The "right" value really depends on the expected scale of the knowledge base. If you estimate that you will have 50 x 200M index partitions for the statement indices, then SQRT(50) =~ 7 would be a good choice.

See Also:
TermIdEncoder

DEFAULT_TERMID_BITS_TO_REVERSE

static final String DEFAULT_TERMID_BITS_TO_REVERSE
See Also:
Constant Field Values

TERM_CACHE_CAPACITY

static final String TERM_CACHE_CAPACITY
Integer option whose value is the capacity of the term cache. This cache provides fast lookup of frequently used RDF Values by their term identifier.


DEFAULT_TERM_CACHE_CAPACITY

static final String DEFAULT_TERM_CACHE_CAPACITY
See Also:
Constant Field Values

VOCABULARY_CLASS

static final String VOCABULARY_CLASS
The name of the class that will establish the pre-defined Vocabulary for the database (default ). The class MUST extend BaseVocabulary. This option is ignored if the lexicon is disabled.

The Vocabulary is initialized by AbstractTripleStore.create(). Its state is stored in the global row store under the TripleStoreSchema.VOCABULARY property. The named Vocabulary class will be used to instantiate a consistent vocabulary mapping each time a view of the AbstractTripleStore is materialized. This depends on the named Vocabulary class having a stable behavior. Thus the BaseVocabulary class builds in protection against version changes and will refuse to materialize a view of the AbstractTripleStore if the Vocabulary would not be consistent.

The BaseVocabulary class is designed for easy and modular extension. You can trivially define a concrete instance of this class which provides any (reasonable) number of VocabularyDecl instances. Each VocabularyDecl declares the namespace(s) and the URIs for some ontology. A number of such classes have been created and are combined by the DEFAULT_VOCABULARY_CLASS. You can create your own VocabularyDecl classes and combine them within your own Vocabulary, but it must extend BaseVocabulary.

Note: There is an interaction between the Vocabulary and IExtensions. The IDatatypeURIResolver requires that URIs used by an IExtension are pre-declared by the Vocabulary.


DEFAULT_VOCABULARY_CLASS

static final String DEFAULT_VOCABULARY_CLASS
Note: The default Vocabulary class may be changed from time to time as additional VocabularyDecl are created and bundled into a new default Vocabulary. However, a deployed concrete instance of the default Vocabulary class MUST NOT be modified since that could introduce inconsistencies into the URI to IV mapping which it provides for AbstractTripleStores created using that class.


AXIOMS_CLASS

static final String AXIOMS_CLASS
The Axioms model that will be used (default ). The value is the name of the class that will be instantiated by AbstractTripleStore.create(). The class must extend BaseAxioms. This option is ignored if the lexicon is disabled. Use NoAxioms to disable inference.


DEFAULT_AXIOMS_CLASS

static final String DEFAULT_AXIOMS_CLASS

CLOSURE_CLASS

static final String CLOSURE_CLASS
The name of the BaseClosure class that will be used (default ). The value is the name of the class that will be used to generate the Program that computes the closure of the database. The class must extend BaseClosure. This option is ignored if the inference is disabled.

There are two pre-defined "programs" used to compute and maintain closure. The FullClosure program is a simple fix point of the RDFS+ entailments, except for the foo rdf:type rdfs:Resource entailments which are normally generated at query time. The FastClosure program breaks nearly all cycles in the RDFS rules and runs nearly entirely as a sequence of IRules, including several custom rules.

It is far easier to modify the FullClosure program since any new rules can just be dropped into place. Modifying the FastClosure program requires careful consideration of the entailments computed at each stage in order to determine where a new rule would fit in.

Note: When support for owl:sameAs, etc. processing is enabled, some of the entailments are computed by rules run during forward closure and some of the entailments are computed by rules run at query time. Both FastClosure and FullClosure are aware of this and handle it correctly (e.g., as configured).


DEFAULT_CLOSURE_CLASS

static final String DEFAULT_CLOSURE_CLASS

ONE_ACCESS_PATH

static final String ONE_ACCESS_PATH
Boolean option (default false) disables all but a single statement index (aka access path).

Note: The main purpose of the option is to make it possible to turn off the other access paths for special bulk load purposes. The use of this option is NOT compatible with either the application of the InferenceEngine or high-level query.

Note: You may want to explicitly enable or disable the bloom filter for this. Normally a single access path (SPO) is used for a temporary store. Temporary stores tend to be smaller, so if you will also be doing point tests on the temporary store then you probably want to use the BLOOM_FILTER. Otherwise it may be turned off to realize some (minimal) performance gain.


DEFAULT_ONE_ACCESS_PATH

static final String DEFAULT_ONE_ACCESS_PATH
See Also:
Constant Field Values

BLOOM_FILTER

static final String BLOOM_FILTER
Optional property controls whether or not a bloom filter is maintained for the SPO statement index. The bloom filter is effective up to ~ 2M entries per index (partition). For scale-up, the bloom filter is automatically disabled after its error rate would be too large given the #of index entries. For scale-out, as the index grows we keep splitting it into more and more index partitions, and those index partitions are comprised of both views of one or more AbstractBTrees. While the mutable BTrees might occasionally grow too large to support a bloom filter, data is periodically migrated onto immutable IndexSegments which have perfect fit bloom filters. This means that the bloom filter scales-out, but not up.

Note: The SPO access path is used any time we have an access path that corresponds to a point test. Therefore this is the only index for which it makes sense to maintain a bloom filter.

If you are going to do a lot of small commits, then please DO NOT enable the bloom filter for the AbstractTripleStore. The bloom filter takes 1 MB each time you commit on the SPO/SPOC index. The bloom filter limited value in any case for scale-up since its nominal error rate will be exceeded at ~2M triples. This concern does not apply for scale-out, where the bloom filter is always a good idea.

See Also:
IndexMetadata.getBloomFilterFactory()
TODO:
Review the various temp triple stores that are created and see which of them would benefit from the SPO bloom filter (TM, backchainers, SIDs fixed point, etc).

DEFAULT_BLOOM_FILTER

static final String DEFAULT_BLOOM_FILTER
See Also:
Constant Field Values

JUSTIFY

static final String JUSTIFY
When true (default ), proof chains for entailments generated by forward chaining are stored in the database. This option is required for truth maintenance when retracting assertion.

If you will not be retracting statements from the database then you can specify false for a significant performance boost during writes and a smaller profile on the disk.

This option does not effect query performance since the justifications are maintained in a distinct index and are only used when retracting assertions.


DEFAULT_JUSTIFY

static final String DEFAULT_JUSTIFY
See Also:
Constant Field Values

STATEMENT_IDENTIFIERS

static final String STATEMENT_IDENTIFIERS
Boolean option (default "false") enables support for statement identifiers. A statement identifier is unique identifier for a triple in the database. Statement identifiers may be used to make statements about statements without using RDF style reification.

Statement identifiers are assigned consistently when Statement s are mapped into the database. This is done using an extension of the term:id index to map the statement as if it were a term onto a unique statement identifier. While the statement identifier is assigned canonically by the term:id index, it is stored redundantly in the value position for each of the statement indices. While the statement identifier is, in fact, a term identifier, the reverse mapping is NOT stored in the id:term index and you CAN NOT translate from a statement identifier back to the original statement.

bigdata supports an RDF/XML interchange extension for the interchange of triples with statement identifiers that may be used as blank nodes to make statements about statements. See BD and RDFXMLParser.

Statement identifiers add some latency when loading data since it increases the size of the writes on the terms index (and also its space requirements since all statements are also replicated in the terms index). However, if you are doing concurrent data load then the added latency is nicely offset by the parallelism.

The main benefit for statement identifiers is that they provide a mechanism for statement level provenance. This is critical for some applications.

An alternative approach to provenance within RDF is to use the concatenation of the subject, predicate, and object (or a hash of their concatenation) as the value in the context position. While this approach can be used with any quad store, it is less transparent and requires twice the amount of data on the disk since you need an additional three statement indices to cover the quad access paths.

The provenance mode (SIDs) IS NOT compatible with the QUADS mode. You may use either one, but not both in the same KB instance.

There are examples for using the provenance mode online.


DEFAULT_STATEMENT_IDENTIFIERS

static final String DEFAULT_STATEMENT_IDENTIFIERS
See Also:
Constant Field Values

QUADS

static final String QUADS
Boolean option determines whether the KB instance will be a quad store or a triple store. For a triple store only, the STATEMENT_IDENTIFIERS option determines whether or not the provenance mode is enabled.


DEFAULT_QUADS

static final String DEFAULT_QUADS
See Also:
Constant Field Values

TRIPLES_MODE

static final String TRIPLES_MODE
Set up database in triples mode, no provenance. This is equivalent to setting the following options:


DEFAULT_TRIPLES_MODE

static final String DEFAULT_TRIPLES_MODE
See Also:
Constant Field Values

TRIPLES_MODE_WITH_PROVENANCE

static final String TRIPLES_MODE_WITH_PROVENANCE
Set up database in triples mode with provenance. This is equivalent to setting the following options:


DEFAULT_TRIPLES_MODE_WITH_PROVENANCE

static final String DEFAULT_TRIPLES_MODE_WITH_PROVENANCE
See Also:
Constant Field Values

QUADS_MODE

static final String QUADS_MODE
Set up database in quads mode. Quads mode means no provenance, no inference. This is equivalent to setting the following options:


DEFAULT_QUADS_MODE

static final String DEFAULT_QUADS_MODE
See Also:
Constant Field Values

VALUE_FACTORY_CLASS

static final String VALUE_FACTORY_CLASS
The name of the BigdataValueFactory class. The implementation MUST declare a method with the following signature which will be used as a canonicalizing factory for the instances of that class.
 public static BigdataValueFactory getInstance(final String namespace)
 

See Also:
DEFAULT_VALUE_FACTORY_CLASS

DEFAULT_VALUE_FACTORY_CLASS

static final String DEFAULT_VALUE_FACTORY_CLASS

TEXT_INDEX

static final String TEXT_INDEX
Boolean option (default "true") enables support for a full text index that may be used to lookup literals by tokens found in the text of those literals.

See Also:
TEXT_INDEXER_CLASS, TEXT_INDEX_DATATYPE_LITERALS, INLINE_TEXT_LITERALS, MAX_INLINE_TEXT_LENGTH

DEFAULT_TEXT_INDEX

static final String DEFAULT_TEXT_INDEX
See Also:
Constant Field Values

SUBJECT_CENTRIC_TEXT_INDEX

static final String SUBJECT_CENTRIC_TEXT_INDEX
Boolean option (default true) enables support for a full text index that may be used to lookup literals by tokens found in the text of those literals.

See Also:
TEXT_INDEXER_CLASS, TEXT_INDEX_DATATYPE_LITERALS, INLINE_TEXT_LITERALS, MAX_INLINE_TEXT_LENGTH

DEFAULT_SUBJECT_CENTRIC_TEXT_INDEX

static final String DEFAULT_SUBJECT_CENTRIC_TEXT_INDEX
See Also:
Constant Field Values

TEXT_INDEX_DATATYPE_LITERALS

static final String TEXT_INDEX_DATATYPE_LITERALS
Boolean option enables support for a full text index that may be used to lookup datatype literals by tokens found in the text of those literals (default "true"). This option will cause ALL datatype literals to be presented to the full text indexer, including xsd:string, xsd:int, etc.


DEFAULT_TEXT_INDEX_DATATYPE_LITERALS

static final String DEFAULT_TEXT_INDEX_DATATYPE_LITERALS
See Also:
Constant Field Values

TEXT_INDEXER_CLASS

static final String TEXT_INDEXER_CLASS
The name of the IValueCentricTextIndexer class. The implementation MUST declare a method with the following signature which will be used to locate instances of that class.
 static public ITextIndexer getInstance(final IIndexManager indexManager,
             final String namespace, final Long timestamp,
             final Properties properties)
 

See Also:
DEFAULT_TEXT_INDEXER_CLASS

DEFAULT_TEXT_INDEXER_CLASS

static final String DEFAULT_TEXT_INDEXER_CLASS

SUBJECT_CENTRIC_TEXT_INDEXER_CLASS

static final String SUBJECT_CENTRIC_TEXT_INDEXER_CLASS
The name of the ITextIndexer class. The implementation MUST declare a method with the following signature which will be used to locate instances of that class.
 static public ITextIndexer getInstance(final IIndexManager indexManager,
             final String namespace, final Long timestamp,
             final Properties properties)
 

See Also:
DEFAULT_TEXT_INDEXER_CLASS

DEFAULT_SUBJECT_CENTRIC_TEXT_INDEXER_CLASS

static final String DEFAULT_SUBJECT_CENTRIC_TEXT_INDEXER_CLASS

BLOBS_THRESHOLD

static final String BLOBS_THRESHOLD
The threshold (in character length) at which an RDF Value will be inserted into the LexiconKeyOrder.BLOBS index rather than the LexiconKeyOrder.TERM2ID and LexiconKeyOrder.ID2TERM indices (default "256").

The LexiconKeyOrder.BLOBS index is capable of storing very large literals but has more IO scatter due to the hash code component of the key for that index. Therefore smaller RDF Values should be inserted into the LexiconKeyOrder.TERM2ID and LexiconKeyOrder.ID2TERM indices while very large RDF Values MUST be inserted into the LexiconKeyOrder.BLOBS index.

The LexiconKeyOrder.TERM2ID index keys are Unicode sort codes based on the RDF Values. This threshold essentially limits the maximum length of the keys in the LexiconKeyOrder.TERM2ID index.


DEFAULT_BLOBS_THRESHOLD

static final String DEFAULT_BLOBS_THRESHOLD
See Also:
Constant Field Values

INLINE_XSD_DATATYPE_LITERALS

static final String INLINE_XSD_DATATYPE_LITERALS
Set up database to inline XSD datatype literals corresponding to primitives (boolean) and numerics (byte, short, int, etc) directly into the statement indices (default "true").

Note: xsd:dateTime inlining is controlled by a distinct option. See INLINE_DATE_TIMES.

Note: xsd:string inlining and the inlining of non-xsd literals are controlled by INLINE_TEXT_LITERALS and MAX_INLINE_TEXT_LENGTH.


DEFAULT_INLINE_XSD_DATATYPE_LITERALS

static final String DEFAULT_INLINE_XSD_DATATYPE_LITERALS
See Also:
Constant Field Values

INLINE_TEXT_LITERALS

static final String INLINE_TEXT_LITERALS
Inline ANY literal having fewer than MAX_INLINE_TEXT_LENGTH characters (default "false").

Note: This option exists mainly to support a scale-out design in which everything is inlined into the statement indices. This design is similar to the YARS2 system with its ISAM files and has the advantage that little or nothing is stored within the lexicon.

Inlining of large literals via this option is NOT compatible with TEXT_INDEX. The problem is that we need to index literals which are inlined as well as those which are not inlined. While the full text index does support this, indexing fully inline literals only makes sense for reasonably short literals. This is because the IV of the inlined literal (a) embeds its (compressed) Unicode representation; and (b) is replicated for each token within that literal. For large literals, this causes a substantial expansion in the full text index.


DEFAULT_INLINE_TEXT_LITERALS

static final String DEFAULT_INLINE_TEXT_LITERALS
See Also:
Constant Field Values

MAX_INLINE_TEXT_LENGTH

static final String MAX_INLINE_TEXT_LENGTH
The maximum length of a String value which may be inlined into the statement indices (default "0" ). Depending on the configuration, this may applies to literal label (and datatypeURI or language code), URI local names, full URIs, blank node IDs, etc. The XSDStringExtension is registered by the DefaultExtensionFactory when GT ZERO (0).

Note: URIs may be readily inlined using this mechanism without causing an interaction with the full text index since they are not indexed by the full text index. However, inlining literals in this manner causes the Unicode representation of the literal to be duplicated within the full text index for each token in that literal. See TEXT_INDEX and INLINE_TEXT_LITERALS.

See Also:
DefaultExtensionFactory

DEFAULT_MAX_INLINE_STRING_LENGTH

static final String DEFAULT_MAX_INLINE_STRING_LENGTH
Note that there an interaction when this is enabled with the full text indexer. When we inline a non-datatype literal then the literal is ALSO inlined into the full text index as well for each keyword in that literal. That can produce quite a bit of duplication. Therefore the full text index does not play well with inlining large literals into the statement indices.

See Also:
Constant Field Values

INLINE_BNODES

static final String INLINE_BNODES
Set up database to inline bnodes directly into the statement indices rather than using the lexicon to map them to term identifiers and back. This is only compatible with told bnodes mode.

See STORE_BLANK_NODES.


DEFAULT_INLINE_BNODES

static final String DEFAULT_INLINE_BNODES
See Also:
Constant Field Values

INLINE_DATE_TIMES

static final String INLINE_DATE_TIMES
Set up database to inline date/times directly into the statement indices rather than using the lexicon to map them to term identifiers and back (default "true"). Date times will be converted to UTC, then stored as milliseconds since the epoch. Thus if you inline date/times you will lose the canonical representation of the date/time. This has two consequences: (1) you will not be able to recover the original time zone of the date/time; and (2) greater than millisecond precision will be lost.

See Also:
INLINE_DATE_TIMES_TIMEZONE

DEFAULT_INLINE_DATE_TIMES

static final String DEFAULT_INLINE_DATE_TIMES
See Also:
Constant Field Values

INLINE_DATE_TIMES_TIMEZONE

static final String INLINE_DATE_TIMES_TIMEZONE
The default time zone to be used to a) encode inline xsd:datetime literals that do not have a time zone specified and b) decode xsd:datetime literals from the statement indices where they are stored as UTC milliseconds since the epoch (default "GMT").

See Also:
INLINE_DATE_TIMES

DEFAULT_INLINE_DATE_TIMES_TIMEZONE

static final String DEFAULT_INLINE_DATE_TIMES_TIMEZONE
See Also:
INLINE_DATE_TIMES_TIMEZONE, Constant Field Values

EXTENSION_FACTORY_CLASS

static final String EXTENSION_FACTORY_CLASS
The name of the IExtensionFactory class. The implementation MUST declare a constructor that accepts an IDatatypeURIResolver as its only argument. The IExtensions constructed by the factory need a resolver to resolve datatype URIs to term identifiers in the database.

See Also:
DEFAULT_EXTENSION_FACTORY_CLASS

DEFAULT_EXTENSION_FACTORY_CLASS

static final String DEFAULT_EXTENSION_FACTORY_CLASS

REJECT_INVALID_XSD_VALUES

static final String REJECT_INVALID_XSD_VALUES
When true AND is true, literals having an xsd datatype URI which can not be validated against that datatype will be rejected (default DEFAULT_REJECT_INVALID_XSD_VALUES). For example, when true abc^^xsd:int would be rejected. When false the literal will be accepted, but it will not be inlined with the rest of the literals for that value space and will typically encounter an SPARQL type error during query evaluation.


DEFAULT_REJECT_INVALID_XSD_VALUES

static final String DEFAULT_REJECT_INVALID_XSD_VALUES
See Also:
Constant Field Values

CONSTRAIN_XXXC_SHARDS

static final String CONSTRAIN_XXXC_SHARDS
Boolean option determines whether or not an XXXCShardSplitHandler is applied (scale-out only, default "true").

When true, shards whose SPOKeyOrder name ends with "C" are constrained such that all quads for the same triple will be co-located on the same shard. This constraint allows certain optimizations for default graph handling.

This constraint may be used if you do not expect to have more than ~200MB worth of distinct graphs within which the same triple may be asserted. This is a soft constraint as larger shards are permitted, but performance will degrade if this constraint forces some shards to be many times larger than their nominal capacity.

See Also:
XXXCShardSplitHandler

DEFAULT_CONSTRAIN_XXXC_SHARDS

static final String DEFAULT_CONSTRAIN_XXXC_SHARDS
See Also:
Constant Field Values


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.