com.bigdata.rdf.spo
Class DefaultGraphSolutionExpander

java.lang.Object
  extended by com.bigdata.rdf.spo.DefaultGraphSolutionExpander
All Implemented Interfaces:
ISolutionExpander<ISPO>, Serializable

public class DefaultGraphSolutionExpander
extends Object
implements ISolutionExpander<ISPO>

Solution expander provides an efficient merged access path for the graphs in the SPARQL default graph. The expander applies the access path to the graph associated with each specified URI, visiting the distinct (s,p,o) tuples found in those graph(s). The context position of the visited ISPOs is discarded (set to null). Duplicate triples are discarded using a BTree for the SPOKeyOrder.SPO key order with its bloom filter enabled. The result is the distinct union of the access paths and hence provides a view of the source graphs as if they had been merged according to Bryan Thompson

See Also:
Serialized Form
TODO:
This class will have to be revisited if want to support quad store inference and expose information about inferred vs explicit statements when reading on the default graph. All of its access paths strip out the StatementEnum. FIXME Scale-out joins depend on knowledge of the best access path and the index partitions (shards) which it will traverse. Review all of the new expanders and make sure that they do not violate this principle. Expanders tend to lazily determine the access path, and I believe that RDFJoinNexus#getTailAccessPath() may even refuse to operate with expanders. If this is the case, then the choice of the access path needs to be completely coded into the predicate as a combination of binding or clearing the context variable and setting an appropriate constraint (filter).

For scale-out this could place us onto a different shard and hence a different data service with the consequence that we wind up doing RMI for the access path. In order to avoid that we need to rewrite the rule to use a nested query along the lines of:

 DISTINCT (s,p,o)
  UNION
      SELECT s,p,o FROM g1
      SELECT s,p,o FROM g2
      ...
      SELECT s,p,o FROM gn
 
The alternative approach for scale-out is to add a filter so that only the specific context is accepted. This filter MUST applied for ALL possible contexts (or all on that shard) so we only run the access path once rather than once per context.

Field Summary
protected static org.apache.log4j.Logger log
           
 
Constructor Summary
DefaultGraphSolutionExpander(Iterable<? extends URI> defaultGraphs)
          Using the expander makes sense even when there is a single graph in the default graph since the expander will strip the context information from the materialized ISPOs.
 
Method Summary
 boolean backchain()
          Add the backchainer on top of the expander.
 IAccessPath<ISPO> getAccessPath(IAccessPath<ISPO> accessPath1)
          Return the IAccessPath that will be used to evaluate the IPredicate.
 Iterator<? extends URI> getGraphs()
          Return an iterator which will visit the source graphs.
 int getKnownGraphCount()
          Return the #of source graphs URIs associated with term identifiers in the database (possible graphs).
 boolean runFirst()
          If true, the predicate for this expander will be given priority in the join order.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static transient org.apache.log4j.Logger log
Constructor Detail

DefaultGraphSolutionExpander

public DefaultGraphSolutionExpander(Iterable<? extends URI> defaultGraphs)
Using the expander makes sense even when there is a single graph in the default graph since the expander will strip the context information from the materialized ISPOs. If the caller can identify that some graph URIs are not known to the database, then they may be safely removed from the defaultGraphs (and query will proceed as if they had been removed). If this leaves an empty set, then no query against the default graph can yield any data.

Parameters:
defaultGraphs - The set of default graphs in the SPARQL DATASET (optional). A runtime exception will be thrown during evaluation of the if the URIs are not BigdataURIs. If this is null, then the default graph is understood to be the RDF merge of ALL graphs in the quad store.
Method Detail

getKnownGraphCount

public int getKnownGraphCount()
Return the #of source graphs URIs associated with term identifiers in the database (possible graphs).


getGraphs

public Iterator<? extends URI> getGraphs()
Return an iterator which will visit the source graphs.


backchain

public boolean backchain()
Description copied from interface: ISolutionExpander
Add the backchainer on top of the expander.

Specified by:
backchain in interface ISolutionExpander<ISPO>
Returns:
true if the backchainer should run

runFirst

public boolean runFirst()
Description copied from interface: ISolutionExpander
If true, the predicate for this expander will be given priority in the join order.

Specified by:
runFirst in interface ISolutionExpander<ISPO>
Returns:
true if the predicate should be run first

getAccessPath

public IAccessPath<ISPO> getAccessPath(IAccessPath<ISPO> accessPath1)
Description copied from interface: ISolutionExpander
Return the IAccessPath that will be used to evaluate the IPredicate.

Specified by:
getAccessPath in interface ISolutionExpander<ISPO>
Parameters:
accessPath1 - The IAccessPath that will be used by default.
Returns:
The IAccessPath that will be used. You can return the given accessPath or you can layer additional semantics onto or otherwise override the given IAccessPath.
Throws:
IllegalArgumentException - if the context position is bound.


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.