com.bigdata.rdf.spo
Class SPORelation

java.lang.Object
  extended by com.bigdata.relation.AbstractResource<IRelation<E>>
      extended by com.bigdata.relation.AbstractRelation<ISPO>
          extended by com.bigdata.rdf.spo.SPORelation
All Implemented Interfaces:
IMutableRelation<ISPO>, IMutableResource<IRelation<ISPO>>, IRelation<ISPO>, ILocatableResource<IRelation<ISPO>>

public class SPORelation
extends AbstractRelation<ISPO>

The SPORelation handles all things related to the indices representing the triples stored in the database. Statements are first converted to term identifiers using the LexiconRelation and then inserted into the statement indices in parallel. There is one statement index for each of the three possible access paths for a triple store. The key is formed from the corresponding permutation of the subject, predicate, and object, e.g., {s,p,o}, {p,o,s}, and {o,s,p} for triples or {s,p,o,c}, etc for quads. The statement type (inferred, axiom, or explicit) and the optional statement identifier are stored under the key. All state for a statement is replicated in each of the statement indices. * @todo integration with package providing magic set rewrites of rules in order to test whether or not a statement is still provable when it is retracted during TM. this will reduce the cost of loading data, since much of that is writing the justifications index.

Version:
$Id: SPORelation.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson

Nested Class Summary
 
Nested classes/interfaces inherited from class com.bigdata.relation.AbstractResource
AbstractResource.Options
 
Field Summary
protected  boolean bloomFilter
          true iff the SPO index will maintain a bloom filter.
 boolean justify
          This is used to conditionally enable the logic to retract justifications when the corresponding statements is retracted.
protected static org.apache.log4j.Logger log
           
static String NAME_SPO_RELATION
          Constant for the SPORelation namespace component.
 boolean oneAccessPath
          This is used to conditionally disable all but a single statement index (aka access path).
 boolean statementIdentifiers
          When true the database will support statement identifiers.
 
Constructor Summary
SPORelation(IIndexManager indexManager, String namespace, Long timestamp, Properties properties)
           
 
Method Summary
 long addJustifications(IChunkedIterator<Justification> itr)
          Adds justifications to the store.
 void create()
          Create any logically contained resources (relations, indices).
 long delete(IChunkedOrderedIterator<ISPO> itr)
          Deletes SPOs, writing on the statement indices in parallel.
 long delete(ISPO[] stmts, int numStmts)
          Delete the SPOs from the statement indices.
 void destroy()
          Destroy any logically contained resources (relations, indices).
 ICloseableIterator<ISPO> distinctSPOIterator(ICloseableIterator<ISPO> src)
          Return an iterator that will visit the distinct (s,p,o) tuples in the source iterator.
 IChunkedIterator<Long> distinctTermScan(IKeyOrder<ISPO> keyOrder)
          Efficient scan of the distinct term identifiers that appear in the first position of the keys for the statement index corresponding to the specified IKeyOrder.
 IChunkedIterator<Long> distinctTermScan(IKeyOrder<ISPO> keyOrder, ITermIdFilter termIdFilter)
          Efficient scan of the distinct term identifiers that appear in the first position of the keys for the statement index corresponding to the specified IKeyOrder.
 StringBuilder dump(IKeyOrder<ISPO> keyOrder)
          Dumps the specified index.
 boolean exists()
           
 SPOAccessPath getAccessPath(IKeyOrder<ISPO> keyOrder, IPredicate<ISPO> predicate)
          Core impl.
 IAccessPath<ISPO> getAccessPath(IPredicate<ISPO> predicate)
          Return the IAccessPath that is most efficient for the specified predicate based on an analysis of the bound and unbound positions in the predicate.
 IAccessPath<ISPO> getAccessPath(long s, long p, long o)
          Deprecated. by getAccessPath(long, long, long, long)
 IAccessPath<ISPO> getAccessPath(long s, long p, long o, long c)
          Return the access path for a triple or quad pattern with an optional filter.
 IAccessPath<ISPO> getAccessPath(long s, long p, long o, long c, IElementFilter<ISPO> filter)
          Return the access path for a triple or quad pattern with an optional filter (core implementation).
 AbstractTripleStore getContainer()
          Strengthened return type.
 Class<ISPO> getElementClass()
          Return the class for the generic type of this relation.
 IIndex getIndex(IKeyOrder<? extends ISPO> keyOrder)
          Overridden to return the hard reference for the index, which is cached the first time it is resolved.
 Set<String> getIndexNames()
          Return the fully qualified name of each index maintained by this relation.
 IIndex getJustificationIndex()
          The optional index on which Justifications are stored.
protected  IndexMetadata getJustIndexMetadata(String name)
          Overrides for the IRawTripleStore#getJustificationIndex().
 int getKeyArity()
          The arity of the key for the statement indices: 3 is a triple store, with or without statement identifiers; 4 is a quad store, which does not support statement identifiers as the 4th position of the (s,p,o,c) is interpreted as context and located in the B+Tree statement index key rather than the value associated with the key.
 IIndex getPrimaryIndex()
           
 SPOKeyOrder getPrimaryKeyOrder()
           
 BTree getSPOOnlyBTree(boolean bloomFilter)
          Return a new unnamed BTree instance for the SPOKeyOrder.SPO key order backed by a TemporaryStore.
 boolean getStatementIdentifiers()
          When true the database will support statement identifiers.
protected  IndexMetadata getStatementIndexMetadata(SPOKeyOrder keyOrder)
          Overrides for the statement indices.
 long insert(IChunkedOrderedIterator<ISPO> itr)
          Inserts SPOs, writing on the statement indices in parallel.
 long insert(ISPO[] a, int numStmts, IElementFilter<ISPO> filter)
          Note: The statements are inserted into each index in parallel.
 SPO newElement(IPredicate<ISPO> predicate, IBindingSet bindingSet)
          Create and return a new element.
 Iterator<SPOKeyOrder> statementKeyOrderIterator()
          Return an iterator visiting each IKeyOrder maintained by this relation.
 
Methods inherited from class com.bigdata.relation.AbstractRelation
getFQN, getIndex, newIndexMetadata
 
Methods inherited from class com.bigdata.relation.AbstractResource
acquireExclusiveLock, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getContainerNamespace, getExecutorService, getFullyBufferedReadThreshold, getIndexManager, getMaxParallelSubqueries, getNamespace, getProperties, getProperty, getProperty, getTimestamp, isForceSerialExecution, isNestedSubquery, toString, unlock
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface com.bigdata.relation.IRelation
getExecutorService, getIndexManager
 
Methods inherited from interface com.bigdata.relation.locator.ILocatableResource
getContainerNamespace, getNamespace, getTimestamp
 

Field Detail

log

protected static final transient org.apache.log4j.Logger log

NAME_SPO_RELATION

public static final String NAME_SPO_RELATION
Constant for the SPORelation namespace component.

Note: To obtain the fully qualified name of an index in the SPORelation you need to append a "." to the relation's namespace, then this constant, then a "." and then the local name of the index.

See Also:
AbstractRelation.getFQN(IKeyOrder), Constant Field Values

justify

public final boolean justify
This is used to conditionally enable the logic to retract justifications when the corresponding statements is retracted.


oneAccessPath

public final boolean oneAccessPath
This is used to conditionally disable all but a single statement index (aka access path).


bloomFilter

protected final boolean bloomFilter
true iff the SPO index will maintain a bloom filter.

See Also:
Options#BLOOM_FILTER

statementIdentifiers

public final boolean statementIdentifiers
When true the database will support statement identifiers. A statement identifier is a unique 64-bit integer taken from the same space as the term identifiers and which uniquely identifiers a statement in the database regardless of the graph in which that statement appears. The purpose of statement identifiers is to allow statements about statements without recourse to RDF style reification.

Constructor Detail

SPORelation

public SPORelation(IIndexManager indexManager,
                   String namespace,
                   Long timestamp,
                   Properties properties)
Method Detail

getKeyArity

public int getKeyArity()
The arity of the key for the statement indices: 3 is a triple store, with or without statement identifiers; 4 is a quad store, which does not support statement identifiers as the 4th position of the (s,p,o,c) is interpreted as context and located in the B+Tree statement index key rather than the value associated with the key.


getStatementIdentifiers

public boolean getStatementIdentifiers()
When true the database will support statement identifiers.

A statement identifier is a unique 64-bit integer taken from the same space as the term identifiers and which uniquely identifiers a statement in the database regardless of the graph in which that statement appears. The purpose of statement identifiers is to allow statements about statements without recourse to RDF style reification.

Only explicit statements will have a statement identifier. Statements made about statements using their statement identifiers will automatically be retracted if a statement they describe is retracted (a micro form of truth maintenance that is always enabled when statement identifiers are enabled).


getContainer

public AbstractTripleStore getContainer()
Strengthened return type.

Overrides:
getContainer in class AbstractResource<IRelation<ISPO>>
Returns:
The container -or- null if there is no container.

exists

public boolean exists()
TODO:
This should use GRS row scan in the GRS for the SPORelation namespace. It is only used by the LocalTripleStore constructor and a unit test's main() method. This method IS NOT part of any public API at this time.

create

public void create()
Description copied from interface: IMutableResource
Create any logically contained resources (relations, indices).

Specified by:
create in interface IMutableResource<IRelation<ISPO>>
Overrides:
create in class AbstractResource<IRelation<ISPO>>

destroy

public void destroy()
Description copied from interface: IMutableResource
Destroy any logically contained resources (relations, indices).

Specified by:
destroy in interface IMutableResource<IRelation<ISPO>>
Overrides:
destroy in class AbstractResource<IRelation<ISPO>>

getIndex

public IIndex getIndex(IKeyOrder<? extends ISPO> keyOrder)
Overridden to return the hard reference for the index, which is cached the first time it is resolved. This class does not eagerly resolve the indices to (a) avoid a performance hit when running in a context where the index view is not required; and (b) to avoid exceptions when running as an ITx.UNISOLATED AbstractTask where the index was not declared and hence can not be materialized.

Overrides:
getIndex in class AbstractRelation<ISPO>
Parameters:
keyOrder - The natural index order.
Returns:
The index -or- null iff the index does not exist as of the timestamp for this view of the relation.
See Also:
FIXME For efficiency the concrete implementations need to override this saving a hard reference to the index and then using a switch like construct to return the correct hard reference. This behavior should be encapsulated.

getPrimaryKeyOrder

public final SPOKeyOrder getPrimaryKeyOrder()

getPrimaryIndex

public final IIndex getPrimaryIndex()

getJustificationIndex

public final IIndex getJustificationIndex()
The optional index on which Justifications are stored.

TODO:
The Justifications index is not a regular index of the SPORelation. In fact, it is a relation for proof chains and is not really of the SPORelation at all and should probably be moved onto its own JRelation. The presence of the Justification index on the SPORelation would cause problems for methods which would like to enumerate the indices, except that we just silently ignore its presence in those methods (it is not in the index[] for example).

This would cause the justification index namespace to change to be a peer of the SPORelation namespace.


distinctSPOIterator

public ICloseableIterator<ISPO> distinctSPOIterator(ICloseableIterator<ISPO> src)
Return an iterator that will visit the distinct (s,p,o) tuples in the source iterator. The context and statement type information will be stripped from the visited ISPOs. The iterator will be backed by a BTree on a TemporaryStore and will use a bloom filter for fast point tests. The BTree and the source iterator will be closed when the returned iterator is closed.

Parameters:
src - The source iterator.
Returns:
The filtered iterator.

getSPOOnlyBTree

public BTree getSPOOnlyBTree(boolean bloomFilter)
Return a new unnamed BTree instance for the SPOKeyOrder.SPO key order backed by a TemporaryStore. The index will only store (s,p,o) triples (not quads) and will not store either the SID or StatementEnum. This is a good choice when you need to impose a "distinct" filter on (s,p,o) triples.

Parameters:
bloomFilter - When true, a bloom filter is enabled for the index. The bloom filter provides fast correct rejection tests for point lookups up to ~2M triples and then shuts off automatically. See BloomFilterFactory.DEFAULT for more details.
Returns:
The SPO index.

getStatementIndexMetadata

protected IndexMetadata getStatementIndexMetadata(SPOKeyOrder keyOrder)
Overrides for the statement indices.


getJustIndexMetadata

protected IndexMetadata getJustIndexMetadata(String name)
Overrides for the IRawTripleStore#getJustificationIndex().


getIndexNames

public Set<String> getIndexNames()
Description copied from interface: IRelation
Return the fully qualified name of each index maintained by this relation.

Returns:
An immutable set of the index names for the relation.

statementKeyOrderIterator

public Iterator<SPOKeyOrder> statementKeyOrderIterator()
Return an iterator visiting each IKeyOrder maintained by this relation.


getAccessPath

public IAccessPath<ISPO> getAccessPath(long s,
                                       long p,
                                       long o)
Deprecated. by getAccessPath(long, long, long, long)

Return the access path for a triple pattern.

Parameters:
s -
p -
o -
Throws:
UnsupportedOperationException - unless the getKeyArity() is 3.

getAccessPath

public IAccessPath<ISPO> getAccessPath(long s,
                                       long p,
                                       long o,
                                       long c)
Return the access path for a triple or quad pattern with an optional filter.


getAccessPath

public IAccessPath<ISPO> getAccessPath(long s,
                                       long p,
                                       long o,
                                       long c,
                                       IElementFilter<ISPO> filter)
Return the access path for a triple or quad pattern with an optional filter (core implementation). All arguments are optional. Any bound argument will restrict the returned access path. For a triple pattern, c WILL BE IGNORED as there is no index over the statement identifiers, even when they are enabled. For a quad pattern, any argument MAY be bound.

Parameters:
s - The subject position (optional).
p - The predicate position (optional).
o - The object position (optional).
c - The context position (optional and ignored for a triple store).
filter - The filter (optional).
Returns:
The best access path for that triple or quad pattern.
Throws:
UnsupportedOperationException - for a triple store without statement identifiers if the c is non-NULL.

getAccessPath

public IAccessPath<ISPO> getAccessPath(IPredicate<ISPO> predicate)
Return the IAccessPath that is most efficient for the specified predicate based on an analysis of the bound and unbound positions in the predicate.

Note: When statement identifiers are enabled, the only way to bind the context position is to already have an SPO on hand. There is no index which can be used to look up an SPO by its context and the context is always a blank node.

Note: This method is a hot spot, especially when the maximum parallelism for subqueries is large. A variety of caching techniques are being evaluated to address this.

Parameters:
pred - The predicate.
Returns:
The best access path for that predicate.

getAccessPath

public SPOAccessPath getAccessPath(IKeyOrder<ISPO> keyOrder,
                                   IPredicate<ISPO> predicate)
Core impl.

Parameters:
keyOrder - The natural order of the selected index (this identifies the index).
predicate - The predicate specifying the query constraint on the access path.
Returns:
The access path.

distinctTermScan

public IChunkedIterator<Long> distinctTermScan(IKeyOrder<ISPO> keyOrder)
Efficient scan of the distinct term identifiers that appear in the first position of the keys for the statement index corresponding to the specified IKeyOrder. For example, using SPOKeyOrder.POS will give you the term identifiers for the distinct predicates actually in use within statements in the SPORelation.

Parameters:
keyOrder - The selected index order.
Returns:
An iterator visiting the distinct term identifiers.

distinctTermScan

public IChunkedIterator<Long> distinctTermScan(IKeyOrder<ISPO> keyOrder,
                                               ITermIdFilter termIdFilter)
Efficient scan of the distinct term identifiers that appear in the first position of the keys for the statement index corresponding to the specified IKeyOrder. For example, using SPOKeyOrder.POS will give you the term identifiers for the distinct predicates actually in use within statements in the SPORelation.

Parameters:
keyOrder - The selected index order.
Returns:
An iterator visiting the distinct term identifiers.

newElement

public SPO newElement(IPredicate<ISPO> predicate,
                      IBindingSet bindingSet)
Description copied from interface: IRelation
Create and return a new element. The element is constructed from the predicate given the bindings. Typically, this is used when generating an ISolution for an IRule during either a query or mutation operations. The element is NOT inserted into the relation.

Parameters:
predicate - The predicate that is the head of some IRule.
bindingSet - A set of bindings for that IRule.
Returns:
The new element.

getElementClass

public Class<ISPO> getElementClass()
Description copied from interface: IRelation
Return the class for the generic type of this relation. This information is used to dynamically create arrays of that generic type.


insert

public long insert(IChunkedOrderedIterator<ISPO> itr)
Inserts SPOs, writing on the statement indices in parallel.

Note: This does NOT write on the justifications index. If justifications are being maintained then the ISolutions MUST report binding sets and an AbstractSolutionBuffer.InsertSolutionBuffer MUST be used that knows how to write on the justifications index AND delegate writes on the statement indices to this method.

Note: This does NOT assign statement identifiers. The SPORelation does not have direct access to the LexiconRelation and the latter is responsible for assigning term identifiers. Code that writes explicit statements onto the statement indices MUST use AbstractTripleStore.addStatements(AbstractTripleStore, boolean, IChunkedOrderedIterator, IElementFilter), which knows how to generate the statement identifiers. In turn, that method will delegate each "chunk" to this method.

Parameters:
itr - An iterator visiting the elements to be written.
Returns:
The #of elements that were actually written on the relation.

delete

public long delete(IChunkedOrderedIterator<ISPO> itr)
Deletes SPOs, writing on the statement indices in parallel.

Note: This does NOT write on the justifications index. If justifications are being maintained then the ISolutions MUST report binding sets and an AbstractSolutionBuffer.InsertSolutionBuffer MUST be used that knows how to write on the justifications index AND delegate writes on the statement indices to this method.

Note: This does NOT perform truth maintenance!

Note: This does NOT compute the closure for statement identifiers (statements that need to be deleted because they are about a statement that is being deleted).

Parameters:
itr - An iterator visiting the elements to be removed. Existing elements in the relation having a key equal to the key formed from the visited elements will be removed from the relation.
Returns:
The #of elements that were actually removed from the relation.
See Also:
AbstractTripleStore.removeStatements(IChunkedOrderedIterator, boolean), SPOAccessPath.removeAll()

insert

public long insert(ISPO[] a,
                   int numStmts,
                   IElementFilter<ISPO> filter)
Note: The statements are inserted into each index in parallel. We clone the statement[] and sort and bulk load each statement index in parallel using a thread pool.

Parameters:
a - An SPO[].
numStmts - The #of elements of that array that will be written.
filter - An optional filter on the elements to be written.
Returns:
The mutation count.

delete

public long delete(ISPO[] stmts,
                   int numStmts)
Delete the SPOs from the statement indices. Any justifications for those statements will also be deleted.

Parameters:
stmts - The SPOs.
numStmts - The #of elements in that array to be processed.
Returns:
The #of statements that were removed (mutationCount). FIXME This needs to return the mutationCount. Resolve what is actually being reported. I expect that BatchRemove only removes those statements that it finds and that there is no constraint in place to assure that this method only sees SPOs known to exist (but perhaps it does since you can only do this safely for explicit statements).

addJustifications

public long addJustifications(IChunkedIterator<Justification> itr)
Adds justifications to the store.

Parameters:
itr - The iterator from which we will read the Justifications to be added. The iterator is closed by this operation.
Returns:
The #of Justifications written on the justifications index.
TODO:
a lot of the cost of loading data is writing the justifications. SLD/magic sets will relieve us of the need to write the justifications since we can efficiently prove whether or not the statements being removed can be entailed from the remaining statements. Any statement which can still be proven is converted to an inference. Since writing the justification chains is such a source of latency, SLD/magic sets will translate into an immediate performance boost for data load.

dump

public StringBuilder dump(IKeyOrder<ISPO> keyOrder)
Dumps the specified index.



Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.