com.bigdata.rdf.rules
Class RDFJoinNexus

java.lang.Object
  extended by com.bigdata.relation.rule.eval.AbstractJoinNexus
      extended by com.bigdata.rdf.rules.RDFJoinNexus
All Implemented Interfaces:
IJoinNexus

public class RDFJoinNexus
extends AbstractJoinNexus
implements IJoinNexus

IProgram execution support for the RDF DB.

The rules have potential parallelism when performing closure. Each join has potential parallelism as well for subqueries. We could even define a PARALLEL iterator flag and have parallelism across index partitions for a read-historical iterator since the data service locators are immutable for historical reads.

Rule-level parallelism (for fix point closure of a rule set) and join subquery-level parallelism could be distributed to available workers in a cluster. In a similar way, high-level queries could be distributed to workers in a cluster to evaluation. Such distribution would increase the practical parallelism beyond what a single machine could support as long as the total parallelism does not overload the cluster.

There is a pragmatic limit on the #of concurrent threads for a single host. When those threads target a blocking queue, then thread contention becomes very high and throughput drops dramatically. We can reduce this problem by allocating a distinct UnsynchronizedArrayBuffer to each task. The task collects a 'chunk' in the UnsynchronizedArrayBuffer. When full, the buffer propagates onto a thread-safe buffer of chunks which flushes either on an IMutableRelation (mutation) or feeding an IAsynchronousIterator (high-level query). It is chunks themselves that accumulate in this thread-safe buffer, so each add() on that buffer may cause the thread to yield, but the return for yielding is an entire chunk in the buffer, not just a single element.

There is one high-level buffer factory corresponding to each of the kinds of ActionEnum: AbstractJoinNexus.newQueryBuffer(); newInsertBuffer(IMutableRelation); and AbstractJoinNexus.newDeleteBuffer(IMutableRelation). In addition there is one for UnsynchronizedArrayBuffers -- this is a buffer that is NOT thread-safe and that is designed to store a single chunk of elements, e.g., in an array E[N]).

Version:
$Id: RDFJoinNexus.java 5062 2011-08-20 23:37:29Z mrpersonick $
Author:
Bryan Thompson

Nested Class Summary
static class RDFJoinNexus.InsertSPOAndJustificationBuffer<E>
          Buffer writes on IMutableRelation#insert(IChunkedIterator) when it is flushed.
 
Field Summary
protected static org.apache.log4j.Logger log
           
 
Fields inherited from class com.bigdata.relation.rule.eval.AbstractJoinNexus
chunkCapacity, chunkOfChunksCapacity, filter, indexManager, planFactory, readTimestamp, resourceLocator, solutionFlags
 
Fields inherited from interface com.bigdata.relation.rule.eval.IJoinNexus
ALL, BINDINGS, ELEMENT, RULE
 
Constructor Summary
RDFJoinNexus(RDFJoinNexusFactory joinNexusFactory, IIndexManager indexManager)
           
 
Method Summary
 IConstant fakeBinding(IPredicate pred, Var var)
          Return a 'fake' binding for the given variable in the specified predicate.
 IRuleStatisticsFactory getRuleStatisticsFactory()
          The factory for rule statistics objects.
 IAccessPath getTailAccessPath(IRelation relation, IPredicate predicate)
          When backchain is true and the tail predicate is reading on the SPORelation, then the IAccessPath is wrapped so that the iterator will visit the backchained inferences as well.
 ISortKeyBuilder<IBindingSet> newBindingSetSortKeyBuilder(IRule rule)
          FIXME unit tests for DISTINCT with a head and ELEMENT, with bindings and a head, with bindings but no head, and with a head but no bindings (error).
 IBuffer<ISolution[]> newInsertBuffer(IMutableRelation relation)
          Overridden to handle justifications when using truth maintenance.
protected  ISortKeyBuilder<?> newSortKeyBuilder(IPredicate<?> head)
          Return the ISortKeyBuilder used to impose DISTINCT on the solutions generated by a query.
 
Methods inherited from class com.bigdata.relation.rule.eval.AbstractJoinNexus
bind, bind, forceSerialExecution, getAction, getBindingSetSerializer, getChunkCapacity, getChunkOfChunksCapacity, getFullyBufferedReadThreshold, getHeadRelationView, getIndexManager, getJoinNexusFactory, getMaxParallelSubqueries, getPlanFactory, getProperty, getProperty, getRangeCountFactory, getReadTimestamp, getRelationLocator, getRuleTaskFactory, getSolutionFilter, getSolutionSerializer, getTailRelationView, getWriteTimestamp, isEmptyProgram, locatorScan, newBindingSet, newDeleteBuffer, newQueryBuffer, newSolution, newUnsynchronizedBuffer, runDistributedProgram, runLocalProgram, runMutation, runProgram, runQuery, solutionFlags
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.bigdata.relation.rule.eval.IJoinNexus
bind, bind, forceSerialExecution, getAction, getBindingSetSerializer, getChunkCapacity, getChunkOfChunksCapacity, getFullyBufferedReadThreshold, getHeadRelationView, getIndexManager, getJoinNexusFactory, getMaxParallelSubqueries, getPlanFactory, getProperty, getProperty, getRangeCountFactory, getReadTimestamp, getRuleTaskFactory, getSolutionFilter, getSolutionSerializer, getTailRelationView, getWriteTimestamp, locatorScan, newBindingSet, newDeleteBuffer, newQueryBuffer, newSolution, newUnsynchronizedBuffer, runMutation, runQuery, solutionFlags
 

Field Detail

log

protected static final transient org.apache.log4j.Logger log
Constructor Detail

RDFJoinNexus

public RDFJoinNexus(RDFJoinNexusFactory joinNexusFactory,
                    IIndexManager indexManager)
Parameters:
joinNexusFactory - The object used to create this instance and which can be used to create other instances as necessary for distributed rule execution.
indexManager - The object used to resolve indices, relations, etc.
Method Detail

getRuleStatisticsFactory

public IRuleStatisticsFactory getRuleStatisticsFactory()
Description copied from interface: IJoinNexus
The factory for rule statistics objects.

Specified by:
getRuleStatisticsFactory in interface IJoinNexus
Overrides:
getRuleStatisticsFactory in class AbstractJoinNexus

getTailAccessPath

public IAccessPath getTailAccessPath(IRelation relation,
                                     IPredicate predicate)
When backchain is true and the tail predicate is reading on the SPORelation, then the IAccessPath is wrapped so that the iterator will visit the backchained inferences as well. On the other hand, if IPredicate.getPartitionId() is defined (not -1) then the returned access path will be for the specified shard using the data service local index manager ( AbstractJoinNexus.indexManager MUST be the data service local index manager for this case) and expanders WILL NOT be applied (they require a view of the total relation, not just a shard).

Specified by:
getTailAccessPath in interface IJoinNexus
Overrides:
getTailAccessPath in class AbstractJoinNexus
Parameters:
relation - The relation.
predicate - The predicate. When IPredicate.getPartitionId() is set, the returned IAccessPath MUST read on the identified local index partition (directly, not via RMI).
Returns:
The access path.
See Also:
InferenceEngine, BackchainAccessPath
TODO:
consider encapsulating the IRangeCountFactory in the returned access path for non-exact range count requests. this will make it slightly harder to write the unit tests for the IEvaluationPlanFactory

fakeBinding

public IConstant fakeBinding(IPredicate pred,
                             Var var)
Description copied from interface: IJoinNexus
Return a 'fake' binding for the given variable in the specified predicate. The binding should be such that it is of a legal type for the slot in the predicate associated with that variable. This is used to discovery the IKeyOrder associated with the IAccessPath that will be used to evaluate the predicate when it appears in the tail of an IRule for a given IEvaluationPlan.

Specified by:
fakeBinding in interface IJoinNexus
Parameters:
pred - The predicate.
var - A variable appearing in that predicate.
Returns:
The 'fake' binding.

newBindingSetSortKeyBuilder

public ISortKeyBuilder<IBindingSet> newBindingSetSortKeyBuilder(IRule rule)
FIXME unit tests for DISTINCT with a head and ELEMENT, with bindings and a head, with bindings but no head, and with a head but no bindings (error). See AbstractJoinNexus.runQuery(IStep) FIXME unit tests for SORT with and without DISTINCT and with the various combinations used in the unit tests for DISTINCT. Note that SORT, unlike DISTINCT, requires that all solutions are materialized before any solutions can be returned to the caller. A lot of optimization can be done for SORT implementations, including merge sort of large blocks (ala map/reduce), using compressed sort keys or word sort keys with 2nd stage disambiguation, etc. FIXME Add property for sort {ascending,descending,none} to IRule. The sort order can also be specified in terms of a sequence of variables. The choice of the variable order should be applied here. FIXME The properties that govern the Unicode collator for the generated sort keys should be configured by the RDFJoinNexusFactory. In particular, Unicode should be handled however it is handled for the LexiconRelation.

Specified by:
newBindingSetSortKeyBuilder in interface IJoinNexus
Parameters:
rule - The rule that will determine the order imposed amoung the bound variables (which variable is 1st, 2nd, 3rd, etc.).
Returns:
The sort key builder (NOT thread-safe).

newSortKeyBuilder

protected ISortKeyBuilder<?> newSortKeyBuilder(IPredicate<?> head)
Description copied from class: AbstractJoinNexus
Return the ISortKeyBuilder used to impose DISTINCT on the solutions generated by a query.

Specified by:
newSortKeyBuilder in class AbstractJoinNexus
Parameters:
head - The head of the rule.
Returns:
The ISortKeyBuilder.

newInsertBuffer

public IBuffer<ISolution[]> newInsertBuffer(IMutableRelation relation)
Overridden to handle justifications when using truth maintenance.

Return a thread-safe buffer onto which chunks of computed ISolutions will be written. When the buffer is flushed the chunked ISolutions will be inserted into the IMutableRelation.

Note: AbstractJoinNexus.getSolutionFilter() is applied by AbstractJoinNexus.newUnsynchronizedBuffer(IBuffer, int) and NOT by the buffer returned by this method.

Specified by:
newInsertBuffer in interface IJoinNexus
Overrides:
newInsertBuffer in class AbstractJoinNexus
Parameters:
relation - The relation.
Returns:
The buffer.


Copyright © 2006-2011 SYSTAP, LLC. All Rights Reserved.