com.bigdata.relation.rule.eval.pipeline
Class JoinTaskFactoryTask
java.lang.Object
com.bigdata.service.FederationCallable<T>
com.bigdata.service.DataServiceCallable<Future>
com.bigdata.relation.rule.eval.pipeline.JoinTaskFactoryTask
- All Implemented Interfaces:
- IDataServiceCallable, IFederationCallable, Serializable, Callable<Future>
public class JoinTaskFactoryTask
- extends DataServiceCallable<Future>
A factory for DistributedJoinTasks. The factory either creates a new
DistributedJoinTask or returns the pre-existing
DistributedJoinTask for the given JoinMasterTask instance (as
identified by its UUID), orderIndex, and partitionId.
When the desired join task pre-exists, factory will invoke
DistributedJoinTask.addSource(IAsynchronousIterator) and specify the
sourceItrProxy as another source for that join task.
The use of a factory pattern allows us to concentrate all
DistributedJoinTasks which target the same tail predicate and index
partition for the same rule execution instance onto the same
DistributedJoinTask. The concentrator effect achieved by the factory
only matters when the fan-out is GT ONE (1).
- Version:
- $Id: JoinTaskFactoryTask.java 5920 2012-01-30 20:16:58Z thompsonbry $
- Author:
- Bryan Thompson
- See Also:
- Serialized Form
- TODO:
- The factory semantics requires something like a "session" concept on
the
DataService. However, it could also be realized by a
canonicalizing mapping of {masterProxy, orderIndex, partitionId} onto
an object that is placed within a weak value cache., Whenever a DistributedJoinTask is interrupted or errors it must
make sure that the entry is removed from the session (it could also
interrupt/cancel the remaining DistributedJoinTasks for the
same {masterInstance}, but we are already doing that in a different
way.), We need to specify the failover behavior when running query or mutation
rules. The simplest answer is that the query or closure operation fails
and can be retried.
When retried a different data service instance could take over for the
failed instance. This presumes some concept of "affinity" for a data
service instance when locating a join task. If there are replicated
instances of a data service, then affinity would be the tendency to
choose the same instance for all join tasks with the same master,
orderIndex, and partitionId. That might be more efficient since it
allows aggregation of binding sets that require the same access path
read. However, it might be more efficient to distribute the reads
across the failover instances - it really depends on the workload.
Ideally, a data service failure would be handled by restarting only
those parts of the operation that had failed. This means that there is
some notion of idempotent for the operation. For at least the RDF
database, this may be achievable. Failure during query leading to
resend of some binding set chunks to a new join task could result in
overgeneration of results, but those results would all be duplicates.
If that is acceptable, then this approach could be considered "safe".
Failure during mutation (aka closure) is even easier for RDF as
redundant writes on an index still lead to the same fixed point.
|
Field Summary |
protected static org.apache.log4j.Logger |
log
|
|
Constructor Summary |
JoinTaskFactoryTask(String scaleOutIndexName,
IRule rule,
IJoinNexusFactory joinNexusFactory,
int[] order,
int orderIndex,
int partitionId,
IJoinMaster masterProxy,
UUID masterUUID,
IAsynchronousIterator<IBindingSet[]> sourceItrProxy,
IKeyOrder[] keyOrders,
IVariable[][] requiredVars)
|
log
protected static final transient org.apache.log4j.Logger log
JoinTaskFactoryTask
public JoinTaskFactoryTask(String scaleOutIndexName,
IRule rule,
IJoinNexusFactory joinNexusFactory,
int[] order,
int orderIndex,
int partitionId,
IJoinMaster masterProxy,
UUID masterUUID,
IAsynchronousIterator<IBindingSet[]> sourceItrProxy,
IKeyOrder[] keyOrders,
IVariable[][] requiredVars)
- Parameters:
scaleOutIndexName - rule - joinNexusFactory - order - orderIndex - partitionId - masterProxy - masterUUID - (Avoids RMI to obtain this later).sourceItrProxy - nextScaleOutIndexName -
toString
public String toString()
- Overrides:
toString in class Object
call
public Future call()
throws Exception
- Either starts a new
DistributedJoinTask and returns its
Future or returns the Future of an existing
DistributedJoinTask for the same
DistributedJoinMasterTask instance, orderIndex, and
partitionId.
- Returns:
- (A proxy for) the
Future of the
DistributedJoinTask.
- Throws:
Exception
newJoinTask
protected DistributedJoinTask newJoinTask()
submit
protected Future<Void> submit(DistributedJoinTask task)
getJoinTaskNamespace
public static String getJoinTaskNamespace(UUID masterUUID,
int orderIndex,
int partitionId)
- Parameters:
masterUUID - The master UUID should be cached locally by the JoinTask so
that invoking this method does not require RMI.orderIndex - partitionId -
- Returns:
Copyright © 2006-2011 SYSTAP, LLC. All Rights Reserved.