com.bigdata.relation.rule.eval.pipeline
Class JoinTaskFactoryTask

java.lang.Object
  extended by com.bigdata.service.FederationCallable<T>
      extended by com.bigdata.service.DataServiceCallable<Future>
          extended by com.bigdata.relation.rule.eval.pipeline.JoinTaskFactoryTask
All Implemented Interfaces:
IDataServiceCallable, IFederationCallable, Serializable, Callable<Future>

public class JoinTaskFactoryTask
extends DataServiceCallable<Future>

A factory for DistributedJoinTasks. The factory either creates a new DistributedJoinTask or returns the pre-existing DistributedJoinTask for the given JoinMasterTask instance (as identified by its UUID), orderIndex, and partitionId. When the desired join task pre-exists, factory will invoke DistributedJoinTask.addSource(IAsynchronousIterator) and specify the sourceItrProxy as another source for that join task.

The use of a factory pattern allows us to concentrate all DistributedJoinTasks which target the same tail predicate and index partition for the same rule execution instance onto the same DistributedJoinTask. The concentrator effect achieved by the factory only matters when the fan-out is GT ONE (1).

Version:
$Id: JoinTaskFactoryTask.java 5920 2012-01-30 20:16:58Z thompsonbry $
Author:
Bryan Thompson
See Also:
Serialized Form
TODO:
The factory semantics requires something like a "session" concept on the DataService. However, it could also be realized by a canonicalizing mapping of {masterProxy, orderIndex, partitionId} onto an object that is placed within a weak value cache., Whenever a DistributedJoinTask is interrupted or errors it must make sure that the entry is removed from the session (it could also interrupt/cancel the remaining DistributedJoinTasks for the same {masterInstance}, but we are already doing that in a different way.), We need to specify the failover behavior when running query or mutation rules. The simplest answer is that the query or closure operation fails and can be retried.

When retried a different data service instance could take over for the failed instance. This presumes some concept of "affinity" for a data service instance when locating a join task. If there are replicated instances of a data service, then affinity would be the tendency to choose the same instance for all join tasks with the same master, orderIndex, and partitionId. That might be more efficient since it allows aggregation of binding sets that require the same access path read. However, it might be more efficient to distribute the reads across the failover instances - it really depends on the workload.

Ideally, a data service failure would be handled by restarting only those parts of the operation that had failed. This means that there is some notion of idempotent for the operation. For at least the RDF database, this may be achievable. Failure during query leading to resend of some binding set chunks to a new join task could result in overgeneration of results, but those results would all be duplicates. If that is acceptable, then this approach could be considered "safe". Failure during mutation (aka closure) is even easier for RDF as redundant writes on an index still lead to the same fixed point.


Field Summary
protected static org.apache.log4j.Logger log
           
 
Constructor Summary
JoinTaskFactoryTask(String scaleOutIndexName, IRule rule, IJoinNexusFactory joinNexusFactory, int[] order, int orderIndex, int partitionId, IJoinMaster masterProxy, UUID masterUUID, IAsynchronousIterator<IBindingSet[]> sourceItrProxy, IKeyOrder[] keyOrders, IVariable[][] requiredVars)
           
 
Method Summary
 Future call()
          Either starts a new DistributedJoinTask and returns its Future or returns the Future of an existing DistributedJoinTask for the same DistributedJoinMasterTask instance, orderIndex, and partitionId.
static String getJoinTaskNamespace(UUID masterUUID, int orderIndex, int partitionId)
           
protected  DistributedJoinTask newJoinTask()
           
protected  Future<Void> submit(DistributedJoinTask task)
           
 String toString()
           
 
Methods inherited from class com.bigdata.service.DataServiceCallable
getDataService, isDataService, setDataService
 
Methods inherited from class com.bigdata.service.FederationCallable
getFederation, setFederation
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface com.bigdata.service.IFederationCallable
getFederation, setFederation
 

Field Detail

log

protected static final transient org.apache.log4j.Logger log
Constructor Detail

JoinTaskFactoryTask

public JoinTaskFactoryTask(String scaleOutIndexName,
                           IRule rule,
                           IJoinNexusFactory joinNexusFactory,
                           int[] order,
                           int orderIndex,
                           int partitionId,
                           IJoinMaster masterProxy,
                           UUID masterUUID,
                           IAsynchronousIterator<IBindingSet[]> sourceItrProxy,
                           IKeyOrder[] keyOrders,
                           IVariable[][] requiredVars)
Parameters:
scaleOutIndexName -
rule -
joinNexusFactory -
order -
orderIndex -
partitionId -
masterProxy -
masterUUID - (Avoids RMI to obtain this later).
sourceItrProxy -
nextScaleOutIndexName -
Method Detail

toString

public String toString()
Overrides:
toString in class Object

call

public Future call()
            throws Exception
Either starts a new DistributedJoinTask and returns its Future or returns the Future of an existing DistributedJoinTask for the same DistributedJoinMasterTask instance, orderIndex, and partitionId.

Returns:
(A proxy for) the Future of the DistributedJoinTask.
Throws:
Exception

newJoinTask

protected DistributedJoinTask newJoinTask()

submit

protected Future<Void> submit(DistributedJoinTask task)

getJoinTaskNamespace

public static String getJoinTaskNamespace(UUID masterUUID,
                                          int orderIndex,
                                          int partitionId)
Parameters:
masterUUID - The master UUID should be cached locally by the JoinTask so that invoking this method does not require RMI.
orderIndex -
partitionId -
Returns:


Copyright © 2006-2011 SYSTAP, LLC. All Rights Reserved.