|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
public static interface AbstractTripleStore.Options
Configuration options.
SPORelation and LexiconRelation?| Field Summary | |
|---|---|
static String |
AXIOMS_CLASS
The Axioms model that will be used (default
). |
static String |
BLOOM_FILTER
Optional property controls whether or not a bloom filter is maintained for the SPO statement index. |
static String |
CLOSURE_CLASS
The name of the BaseClosure class that will be used (default
). |
static String |
DEFAULT_AXIOMS_CLASS
|
static String |
DEFAULT_BLOOM_FILTER
|
static String |
DEFAULT_CLOSURE_CLASS
|
static String |
DEFAULT_JUSTIFY
|
static String |
DEFAULT_LEXICON
|
static String |
DEFAULT_ONE_ACCESS_PATH
|
static String |
DEFAULT_QUADS
|
static String |
DEFAULT_STATEMENT_IDENTIFIERS
|
static String |
DEFAULT_STORE_BLANK_NODES
|
static String |
DEFAULT_TERMID_BITS_TO_REVERSE
|
static String |
DEFAULT_TEXT_INDEX
|
static String |
DEFAULT_TEXT_INDEX_DATATYPE_LITERALS
|
static String |
DEFAULT_VOCABULARY_CLASS
|
static String |
JUSTIFY
When true (default ),
proof chains for entailments generated by forward chaining are stored
in the database. |
static String |
LEXICON
Boolean option (default true) enables support for the
lexicon (the forward and backward term indices). |
static String |
ONE_ACCESS_PATH
Boolean option (default false) disables all but a
single statement index (aka access path). |
static String |
QUADS
Boolean option determines whether the KB instance will be a quad store or a triple store. |
static String |
STATEMENT_IDENTIFIERS
Boolean option (default "false") enables support for statement identifiers. |
static String |
STORE_BLANK_NODES
Boolean option (default "false") controls whether or not we store blank nodes in the forward mapping of the lexicon. |
static String |
TERMID_BITS_TO_REVERSE
Option effects how evenly distributed the assigned term identifiers which has a pronounced effect on the ID2TERM and statement indices for scale-out deployments. |
static String |
TEXT_INDEX
Boolean option (default true) enables support for a
full text index that may be used to lookup literals by tokens found
in the text of those literals. |
static String |
TEXT_INDEX_DATATYPE_LITERALS
Boolean option (default false) enables support for a
full text index that may be used to lookup datatype literals by
tokens found in the text of those literals. |
static String |
VOCABULARY_CLASS
The name of the class that will establish the pre-defined Vocabulary for the database (default
). |
| Fields inherited from interface com.bigdata.btree.keys.KeyBuilder.Options |
|---|
COLLATOR, DECOMPOSITION, STRENGTH, USER_COUNTRY, USER_LANGUAGE, USER_VARIANT |
| Fields inherited from interface com.bigdata.rdf.store.DataLoader.Options |
|---|
BUFFER_CAPACITY, CLOSURE, COMMIT, DEFAULT_BUFFER_CAPACITY, DEFAULT_CLOSURE, DEFAULT_COMMIT, DEFAULT_FLUSH, DEFAULT_VERIFY_DATA, FLUSH, VERIFY_DATA |
| Fields inherited from interface com.bigdata.search.FullTextIndex.Options |
|---|
DEFAULT_INDEXER_COLLATOR_STRENGTH, DEFAULT_INDEXER_TIMEOUT, DEFAULT_OVERWRITE, INDEXER_COLLATOR_STRENGTH, INDEXER_TIMEOUT, OVERWRITE |
| Field Detail |
|---|
static final String LEXICON
true) enables support for the
lexicon (the forward and backward term indices). When
false, the lexicon indices are not registered. This
can be safely turned off for the TempTripleStore when only
the statement indices are to be used.
LexiconRelationstatic final String DEFAULT_LEXICON
static final String STORE_BLANK_NODES
When false blank node semantics are enforced, you CAN
NOT unify blank nodes based on their IDs in the lexicon, and
AbstractTripleStore.getBNodeCount() is disabled.
When true, you are able to violate blank node
semantics and force unification of blank nodes by assigning the ID
from the RDF interchange syntax to the blank node. RIO has an option
that will allow you to do this. When this option is also
true, then you will in fact be able to resolve
pre-existing blank nodes using their identifiers. The tradeoff is
time and space : if you have a LOT of document using blank nodes then
you might want to disable this option in order to spend less time
writing the forward lexicon index (and it will also take up less
space).
static final String DEFAULT_STORE_BLANK_NODES
static final String TERMID_BITS_TO_REVERSE
For the scale-out triple store, the term identifiers are formed by placing the index partition identifier in the high word and the local counter for the index partition into the low word. In addition, the sign bit is "stolen" from each value such that the low two bits are left open for bit flags which encode the type (URI, Literal, BNode or SID) of the term. The effect of this option is to cause the low N bits of the local counter value to be reversed and written into the high N bits of the term identifier (the other bits are shifted down to make room for this). Regardless of the configured value for this option, all bits (except the sign bit) of the both the partition identifier and the local counter are preserved.
Normally, the low bits of a sequential counter will vary the most rapidly. By reversing the localCounter and placing some of the reversed bits into the high bits of the term identifier we cause the term identifiers to be uniformly (but not randomly) distributed. This is much like using hash function without collisions or a random number generator that does not produce duplicates. When ZERO (0) no bits are reversed so the high bits of the term identifiers directly reflect the partition identifier and the low bits are assigned sequentially by the local counter within each TERM2ID index partition.
The use of a non-zero value for this option can easily cause the write load on the index partitions for the ID2TERM and statement indices to be perfectly balanced. However, using too many bits has some negative consequences on locality of operations within an index partition (since the distribution of the keys be approximately uniform distribution, leading to poor cache performance, more copy-on-write for the B+Tree, and both more IO and faster growth in the journal for writes (since there will be more leaves made dirty on average by each bulk write)).
The use of a non-zero value for this option also directly effects the degree of scatter for bulk read or write operations. As more bits are used, it becomes increasingly likely that each bulk read or write operation will on average touch all index partitions. This is because #of low order local counter bits reversed and rotated into the high bits of the term identifier places an approximate bound on the #of index partitions of the ID2TERM or a statement index that will be touched by a scattered read or write. However, that number will continue to grow slowly over time as new partition identifiers are introduced (the partition identifiers appear next in the encoded term identifier and therefore determine the degree of locality or scatter once the quickly varying high bits have had their say).
The "right" value really depends on the expected scale of the knowledge base. If you estimate that you will have 50 x 200M index partitions for the statement indices, then SQRT(50) =~ 7 would be a good choice.
static final String DEFAULT_TERMID_BITS_TO_REVERSE
static final String TEXT_INDEX
true) enables support for a
full text index that may be used to lookup literals by tokens found
in the text of those literals.
static final String DEFAULT_TEXT_INDEX
static final String TEXT_INDEX_DATATYPE_LITERALS
false) enables support for a
full text index that may be used to lookup datatype literals by
tokens found in the text of those literals.
static final String DEFAULT_TEXT_INDEX_DATATYPE_LITERALS
static final String VOCABULARY_CLASS
Vocabulary for the database (default
). The class MUST extend
BaseVocabulary. This option is ignored if the lexicon is
disabled.
The Vocabulary is initialized by
AbstractTripleStore.create() and its serialized state is
stored in the global row store under the
TripleStoreSchema.VOCABULARY property.
NoVocabulary,
RDFSVocabularystatic final String DEFAULT_VOCABULARY_CLASS
static final String AXIOMS_CLASS
Axioms model that will be used (default
). The value is the name of the
class that will be instantiated by
AbstractTripleStore.create(). The class must extend
BaseAxioms. This option is ignored if the lexicon is
disabled. Use NoAxioms to disable inference.
static final String DEFAULT_AXIOMS_CLASS
static final String CLOSURE_CLASS
BaseClosure class that will be used (default
). The value is the name of
the class that will be used to generate the Program that
computes the closure of the database. The class must extend
BaseClosure. This option is ignored if the inference is
disabled.
There are two pre-defined "programs" used to compute and maintain
closure. The FullClosure program is a simple fix point of the
RDFS+ entailments, except for the
foo rdf:type rdfs:Resource entailments which are
normally generated at query time. The FastClosure program
breaks nearly all cycles in the RDFS rules and runs nearly entirely
as a sequence of IRules, including several custom rules.
It is far easier to modify the FullClosure program since any
new rules can just be dropped into place. Modifying the
FastClosure program requires careful consideration of the
entailments computed at each stage in order to determine where a new
rule would fit in.
Note: When support for owl:sameAs, etc. processing is
enabled, some of the entailments are computed by rules run during
forward closure and some of the entailments are computed by rules run
at query time. Both FastClosure and FullClosure are
aware of this and handle it correctly (e.g., as configured).
static final String DEFAULT_CLOSURE_CLASS
static final String ONE_ACCESS_PATH
false) disables all but a
single statement index (aka access path).
Note: The main purpose of the option is to make it possible to turn
off the other access paths for special bulk load purposes. The use of
this option is NOT compatible with either the application of the
InferenceEngine or high-level query.
Note: You may want to explicitly enable or disable the bloom filter
for this. Normally a single access path (SPO) is used for a temporary
store. Temporary stores tend to be smaller, so if you will also be
doing point tests on the temporary store then you probably want to
use the BLOOM_FILTER. Otherwise it may be turned off to
realize some (minimal) performance gain.
static final String DEFAULT_ONE_ACCESS_PATH
static final String BLOOM_FILTER
AbstractBTrees. While the mutable BTrees might
occasionally grow too large to support a bloom filter, data is
periodically migrated onto immutable IndexSegments which have
perfect fit bloom filters. This means that the bloom filter
scales-out, but not up.
Note: The SPO access path is used any time we have an access path that corresponds to a point test. Therefore this is the only index for which it makes sense to maintain a bloom filter.
If you are going to do a lot of small commits, then please DO NOT
enable the bloom filter for the AbstractTripleStore. The
bloom filter takes 1 MB each time you commit on the SPO/SPOC index.
The bloom filter limited value in any case for scale-up since its
nominal error rate will be exceeded at ~2M triples. This concern does
not apply for scale-out, where the bloom filter is always a good
idea.
IndexMetadata.getBloomFilterFactory()static final String DEFAULT_BLOOM_FILTER
static final String JUSTIFY
true (default ),
proof chains for entailments generated by forward chaining are stored
in the database. This option is required for truth maintenance when
retracting assertion.
If you will not be retracting statements from the database then you
can specify false for a significant performance boost
during writes and a smaller profile on the disk.
This option does not effect query performance since the justifications are maintained in a distinct index and are only used when retracting assertions.
static final String DEFAULT_JUSTIFY
static final String STATEMENT_IDENTIFIERS
Statement identifiers are assigned consistently when Statement
s are mapped into the database. This is done using an extension of
the term:id index to map the statement as if it were a
term onto a unique statement identifier. While the statement
identifier is assigned canonically by the term:id index,
it is stored redundantly in the value position for each of the
statement indices. While the statement identifier is, in fact, a term
identifier, the reverse mapping is NOT stored in the id:term index
and you CAN NOT translate from a statement identifier back to the
original statement.
bigdata supports an RDF/XML interchange extension for the interchange
of triples with statement identifiers that may be used as
blank nodes to make statements about statements. See BNS and
RDFXMLParser.
Statement identifiers add some latency when loading data since it increases the size of the writes on the terms index (and also its space requirements since all statements are also replicated in the terms index). However, if you are doing concurrent data load then the added latency is nicely offset by the parallelism.
The main benefit for statement identifiers is that they provide a mechanism for statement level provenance. This is critical for some applications.
An alternative approach to provenance within RDF is to use the concatenation of the subject, predicate, and object (or a hash of their concatenation) as the value in the context position. While this approach can be used with any quad store, it is less transparent and requires twice the amount of data on the disk since you need an additional three statement indices to cover the quad access paths.
The provenance mode (SIDs) IS NOT compatible with the QUADS
mode. You may use either one, but not both in the same KB instance.
There are examples for using the provenance mode online.
static final String DEFAULT_STATEMENT_IDENTIFIERS
static final String QUADS
STATEMENT_IDENTIFIERS option determines whether or not the
provenance mode is enabled.
static final String DEFAULT_QUADS
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||