com.bigdata.rdf.sail
Class BigdataEvaluationStrategyImpl

java.lang.Object
  extended by org.openrdf.query.algebra.evaluation.impl.EvaluationStrategyImpl
      extended by com.bigdata.rdf.sail.BigdataEvaluationStrategyImpl
All Implemented Interfaces:
org.openrdf.query.algebra.evaluation.EvaluationStrategy

public class BigdataEvaluationStrategyImpl
extends org.openrdf.query.algebra.evaluation.impl.EvaluationStrategyImpl

Extended to rewrite Sesame TupleExprs onto native Rules and to evaluate magic predicates for full text search, etc. Query evaluation can proceed either by Sesame 2 evaluation or, if BigdataSail.Options.NATIVE_JOINS is enabled, then by translation of Sesame 2 query expressions into native IRules and native evaluation of those IRules.

Query options

The following summarizes how various high-level query language feature are mapped onto native IRules.
DISTINCT
IQueryOptions.isDistinct(), which is realized using DistinctFilter.
ORDER BY
IQueryOptions.getOrderBy() is effected by a custom IKeyBuilderFactory which generates sort keys that capture the desired sort order from the bindings in an ISolution. Unless DISTINCT is also specified, the generated sort keys are made unique by appending a one up long integer to the key - this prevents sort keys that otherwise compare as equals from dropping solutions. Note that the SORT is actually imposed by the DistinctFilter using an IKeyBuilderFactory assembled from the ORDER BY constraints. FIXME BryanT - implement the IKeyBuilderFactory. FIXME MikeP - assemble the ISortOrder[] from the query and set on the IQueryOptions.
OFFSET and LIMIT

IQueryOptions.getSlice(), which is effected as a conditional in NestedSubqueryWithJoinThreadsTask based on the RuleStats.solutionCount. Query ISolutions are counted as they are generated, but they are only entered into the ISolution IBuffer when the solutionCount is GE the OFFSET and LT the LIMIT. Query evaluation halts once the LIMIT is reached.

Note that when DISTINCT and either LIMIT and/or OFFSET are specified together, then the LIMIT and OFFSET MUST be applied after the solutions have been generated since we may have to generate more than LIMIT solutions in order to have LIMIT DISTINCT solutions. We handle this for now by NOT translating the LIMIT and OFFSET onto the IRule and instead let Sesame close the iterator once it has enough solutions.

Note that LIMIT and SLICE requires an evaluation plan that provides stable results. For a simple query this is achieved by setting IQueryOptions.isStable() to true.

For a UNION query, you must also set IProgram.isParallel() to false to prevent parallelized execution of the IRules in the IProgram.

UNION
A UNION is translated into an IProgram consisting of one IRule for each clause in the UNION. FIXME MikeP - implement.

Filters

The following provides a summary of how various kinds of FILTER are handled. A filter that is not explicitly handled is left untranslated and will be applied by Sesame against the generated ISolutions.

Whenever possible, a FILTER is translated into an IConstraint on an IPredicate in the generated native IRule. Some filters are essentially JOINs against the LexiconRelation. Those can be handled either as JOINs (generating an additional IPredicate in the IRule) or as an IN constraint, where the inclusion set is pre-populated by some operation on the LexiconRelation.

EQ
Translated into an EQ constraint on an IPredicate.
NE
Translated into an NE constraint on an IPredicate.
IN
Translated into an IN constraint on an IPredicate.
OR
Translated into an OR constraint on an IPredicate.

Magic predicates

BNS.SEARCH is the only magic predicate at this time. When the object position is bound to a constant, the magic predicate is evaluated once and the result is used to generate a set of term identifiers that are matches for the token(s) extracted from the Literal in the object position. Those term identifiers are then used to populate an IN constraint. The object position in the BNS.SEARCH MUST be bound to a constant.

FIXME We are not in fact rewriting the query operation at all, simply choosing a different evaluation path as we go. The rewrite should really be isolated from the execution, e.g., in its own class. That more correct approach is more than I want to get into right now as we will have to define variants on the various operators that let us model the native rule system directly, e.g., an n-ary IProgram, n-ary IRule operator, an IPredicate operator, etc. Then we can handle evaluation using their model with anything re-written to our custom operators being caught by our custom evaluate() methods and everything else running their default methods. Definitely the right approach, and much easier to write unit tests.

Version:
$Id: BigdataEvaluationStrategyImpl.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
TODO:
REGEX : if there is a "ˆ" literal followed by a wildcard AND there are no flags which would cause problems (case-folding, etc) then the REGEX can be rewritten as a prefix scan on the lexicon, which is very efficient, and converted to an IN filter. When the set size is huge we should rewrite it as another tail in the query instead.

Otherwise, regex filters are left outside of the rule. We can't optimize that until we generate rules that perform JOINs across the lexicon and the spo relations (which we could do, in which case it becomes a constraint on that join).

We don't have any indices that are designed to optimize regex scans, but we could process a regex scan as a parallel iterator scan against the lexicon., Roll more kinds of filters into the native IRules as IConstraints on IPredicates.

isURI(), etc. can be evaluated by testing a bit flag on the term identifier, which is very efficient.

, Verify handling of datatype operations.


Field Summary
protected  org.openrdf.query.Dataset dataset
           
protected static boolean DEBUG
           
protected static boolean INFO
           
protected static org.apache.log4j.Logger log
          Logger.
protected  BigdataTripleSource tripleSource
           
 
Constructor Summary
BigdataEvaluationStrategyImpl(BigdataTripleSource tripleSource, org.openrdf.query.Dataset dataset, boolean nativeJoins)
           
 
Method Summary
 info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> evaluate(org.openrdf.query.algebra.Join join, org.openrdf.query.BindingSet bindings)
          Uses native joins iff BigdataSail.Options.NATIVE_JOINS is specified.
 info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> evaluate(org.openrdf.query.algebra.StatementPattern sp, org.openrdf.query.BindingSet bindings)
          Overridden to recognize magic predicates.
protected  info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> runQuery(org.openrdf.query.algebra.TupleExpr tupleExpr, IRule rule)
          Run a rule based on a TupleExpr as a query.
protected  info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> search(org.openrdf.query.algebra.Var svar, String languageCode, String label, org.openrdf.query.BindingSet bindings)
          Evaluates the BNS.SEARCH magic predicate as a full-text search against the index literal in the database, binding svar to each matched literal in turn.
 
Methods inherited from class org.openrdf.query.algebra.evaluation.impl.EvaluationStrategyImpl
evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, evaluate, getVarValue, isTrue
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static final org.apache.log4j.Logger log
Logger.


INFO

protected static final boolean INFO

DEBUG

protected static final boolean DEBUG

tripleSource

protected final BigdataTripleSource tripleSource

dataset

protected final org.openrdf.query.Dataset dataset
Constructor Detail

BigdataEvaluationStrategyImpl

public BigdataEvaluationStrategyImpl(BigdataTripleSource tripleSource,
                                     org.openrdf.query.Dataset dataset,
                                     boolean nativeJoins)
Parameters:
tripleSource -
dataset -
Method Detail

evaluate

public info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> evaluate(org.openrdf.query.algebra.StatementPattern sp,
                                                                                                                                 org.openrdf.query.BindingSet bindings)
                                                                                                                          throws org.openrdf.query.QueryEvaluationException
Overridden to recognize magic predicates.

Overrides:
evaluate in class org.openrdf.query.algebra.evaluation.impl.EvaluationStrategyImpl
Throws:
org.openrdf.query.QueryEvaluationException

search

protected info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> search(org.openrdf.query.algebra.Var svar,
                                                                                                                                  String languageCode,
                                                                                                                                  String label,
                                                                                                                                  org.openrdf.query.BindingSet bindings)
                                                                                                                           throws org.openrdf.query.QueryEvaluationException
Evaluates the BNS.SEARCH magic predicate as a full-text search against the index literal in the database, binding svar to each matched literal in turn.

Note: The minimum cosine (relevance score) is set to ZERO (0d) in order to make sure that any match within a literal qualifies that literal for inclusion within the set of bindings that are materialized. This is in contrast to a standard search engine, where a minimum relevance score is used to filter out less likely matches. However, it is my sense that query against the KB is often used to find ALL matches. Regardless, the matches will be materialized in order of decreasing relevance and an upper bound of 10000 matches is imposed by the search implementation. See FullTextIndex.search(String, String, double, int).

Parameters:
svar - The variable from the subject position of the StatementPattern in which the BNS.SEARCH magic predicate appears.
languageCode - An optional language code from the bound literal appearing in the object position of that StatementPattern.
label - The required label from the bound literal appearing in the object position of that StatementPattern.
bindings - The current bindings.
Returns:
Iteration visiting the bindings obtained by the search.
Throws:
org.openrdf.query.QueryEvaluationException
TODO:
consider options for term weights and normalization. Search on the KB is probably all terms with anything matching, stopwords are excluded, and term weights can be without normalization since ranking does not matter. And maxRank should probably be defeated (Integer.MAX_VALUE or ZERO - whichever does the trick)., it would be nice if there were a way for the caller to express more parameters for the search, e.g., to give the minCosine and maxRank values directly as a tuple-based function call. I'm not sure if that is supported within Sesame/SPARQL.

evaluate

public info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> evaluate(org.openrdf.query.algebra.Join join,
                                                                                                                                 org.openrdf.query.BindingSet bindings)
                                                                                                                          throws org.openrdf.query.QueryEvaluationException
Uses native joins iff BigdataSail.Options.NATIVE_JOINS is specified.

Note: As a pre-condition, the Values in the query expression MUST have been rewritten as BigdataValues and their term identifiers MUST have been resolved. Any term identifier that remains IRawTripleStore.NULL is an indication that there is no entry for that Value in the database. Since the JOINs are required (vs OPTIONALs), that means that there is no solution for the JOINs and an EmptyIteration is returned rather than evaluating the query.

Overrides:
evaluate in class org.openrdf.query.algebra.evaluation.impl.EvaluationStrategyImpl
Throws:
org.openrdf.query.QueryEvaluationException

runQuery

protected info.aduna.iteration.CloseableIteration<org.openrdf.query.BindingSet,org.openrdf.query.QueryEvaluationException> runQuery(org.openrdf.query.algebra.TupleExpr tupleExpr,
                                                                                                                                    IRule rule)
                                                                                                                             throws org.openrdf.query.QueryEvaluationException
Run a rule based on a TupleExpr as a query.

Parameters:
rule - The rule to execute.
Returns:
The Sesame 2 iteration that visits the BindingSets that are the results for the query.
Throws:
org.openrdf.query.QueryEvaluationException


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.