com.bigdata.bop
Class PipelineOp

java.lang.Object
  extended by com.bigdata.bop.CoreBaseBOp
      extended by com.bigdata.bop.BOpBase
          extended by com.bigdata.bop.PipelineOp
All Implemented Interfaces:
BOp, IPropertySet, Serializable, Cloneable
Direct Known Subclasses:
AbstractAddRemoveStatementsOp, AbstractHashJoinOp, AbstractMergeJoin, AbstractSubqueryOp, ChunkedMaterializationOp, ChunkedResolutionOp, CommitOp, ConditionalRoutingOp, CopyOp, DataSetJoin, DropOp, EndOp, GroupByOp, HTreeDistinctBindingSetsOp, HTreeHashIndexOp, HTreeNamedSubqueryOp, HTreeSolutionSetHashJoinOp, InlineMaterializeOp, InsertDataOp, InsertOp, JoinGraph, JVMDistinctBindingSetsOp, JVMHashIndexOp, JVMNamedSubqueryOp, JVMSolutionSetHashJoinOp, NamedSolutionSetScanOp, ParseOp, PipelineJoin, ProjectionOp, ServiceCallJoin, SliceOp, SortOp, SubqueryOp

public abstract class PipelineOp
extends BOpBase

Abstract base class for pipeline operators where the data moving along the pipeline is chunks of IBindingSets.

Version:
$Id: PipelineOp.java 6013 2012-02-12 21:29:01Z thompsonbry $
Author:
Bryan Thompson
See Also:
Serialized Form

Nested Class Summary
static interface PipelineOp.Annotations
           
 
Field Summary
 
Fields inherited from class com.bigdata.bop.CoreBaseBOp
DEFAULT_INITIAL_CAPACITY
 
Fields inherited from interface com.bigdata.bop.BOp
NOANNS, NOARGS
 
Constructor Summary
protected PipelineOp(BOp[] args, Map<String,Object> annotations)
          Shallow copy constructor.
protected PipelineOp(PipelineOp op)
          Required deep copy constructor.
 
Method Summary
protected  void assertAtOnceJavaHeapOp()
          Assert that this operator is annotated as an "at-once" operator which buffers its data on the java heap.
abstract  FutureTask<Void> eval(BOpContext<IBindingSet> context)
          Return a FutureTask which computes the operator against the evaluation context.
 int getChunkCapacity()
           
 int getChunkOfChunksCapacity()
           
 long getChunkTimeout()
           
 long getMaxMemory()
          The maximum amount of memory which may be used to buffered inputs for this operator on the native heap.
 int getMaxParallel()
          The maximum parallelism with which tasks may be evaluated for this operator (this is a per-shard limit in scale-out).
 boolean isAtOnceEvaluation()
          true iff the operator will use at-once evaluation (all inputs for the operator will be buffered and the operator will run exactly once to consume those inputs).
 boolean isBlockedEvaluation()
          true iff the operator uses blocked evaluation (it buffers data on the native heap up to a threshold and then evaluate that block of data).
 boolean isLastPassRequested()
          Return true iff a final evaluation pass is requested by the operator.
 boolean isPipelinedEvaluation()
          Return true iff the operator uses pipelined evaluation (versus "at-once" or "blocked" evaluation as discussed below).
 boolean isSharedState()
          Return true iff #newStats(IQueryContext) must be shared across all invocations of eval(BOpContext) for this operator for a given query.
 BOpStats newStats()
          Return a new object which can be used to collect statistics on the operator evaluation.
 
Methods inherited from class com.bigdata.bop.BOpBase
_clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArray
 
Methods inherited from class com.bigdata.bop.CoreBaseBOp
annotationsEqual, annotationsToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, toShortString, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

PipelineOp

protected PipelineOp(PipelineOp op)
Required deep copy constructor.

Parameters:
op -

PipelineOp

protected PipelineOp(BOp[] args,
                     Map<String,Object> annotations)
Shallow copy constructor.

Parameters:
args -
annotations -
Method Detail

getChunkCapacity

public final int getChunkCapacity()
See Also:
BufferAnnotations.CHUNK_CAPACITY

getChunkOfChunksCapacity

public final int getChunkOfChunksCapacity()
See Also:
BufferAnnotations.CHUNK_OF_CHUNKS_CAPACITY

getChunkTimeout

public final long getChunkTimeout()
See Also:
BufferAnnotations.CHUNK_TIMEOUT

getMaxMemory

public final long getMaxMemory()
The maximum amount of memory which may be used to buffered inputs for this operator on the native heap. When ZERO (0), the inputs will be buffered on the JVM heap. When Long.MAX_VALUE, an essentially unbounded amount of data may be buffered on the native heap. Together with PipelineOp.Annotations.PIPELINED, PipelineOp.Annotations.MAX_MEMORY is used to determine whether an operator uses pipelined, at-once, or blocked evaluation.

See Also:
PipelineOp.Annotations.MAX_MEMORY

isPipelinedEvaluation

public final boolean isPipelinedEvaluation()
Return true iff the operator uses pipelined evaluation (versus "at-once" or "blocked" evaluation as discussed below).
Pipelined
Pipelined operators stream chunks of intermediate results from one operator to the next using a producer / consumer pattern. Each time a set of intermediate results is available for a pipelined operator, it is evaluated against those inputs producing another set of intermediate results for its target operator(s). Pipelined operators may be evaluated many times during a given query and often have excellent parallelism due to the concurrent evaluation of the different operators on different sets of intermediate results.
At-Once
An "at-once" operator will run exactly once and must wait for all of its inputs to be assembled before it runs. There are some operations for which "at-once" evaluation is always required, such as ORDER_BY. Other operations MAY use operator-at-once evaluation in order to benefit from a combination of more efficient IO patterns and simpler design. At-once operators may either buffer their data on the Java heap (which is not scalable due to the heap pressure exerted on the garbage collector) or buffer their data on the native heap (which does scale).
Blocked
Blocked operators buffer large amounts of data on the native heap and run each time they exceed some threshold #of bytes of buffered data. A blocked operator is basically an "at-once" operator which buffers its data on the native heap and which can be evaluated in multiple passes. For example, a hash join could use a blocked operator design while an ORDER_BY operator can not. By deferring their evaluation until some threshold amount of data has been materialized, they may be evaluated once or more than once, depending on the data scale, but still retain many of the benefits of "at-once" evaluation in terms of IO patterns. Whether or not an operator can be used as a "blocked" operator is a matter of the underlying operator implementation.

See Also:
PipelineOp.Annotations.PIPELINED, PipelineOp.Annotations.MAX_MEMORY

isAtOnceEvaluation

public final boolean isAtOnceEvaluation()
true iff the operator will use at-once evaluation (all inputs for the operator will be buffered and the operator will run exactly once to consume those inputs).

See Also:
PipelineOp.Annotations.PIPELINED, PipelineOp.Annotations.MAX_MEMORY

isBlockedEvaluation

public final boolean isBlockedEvaluation()
true iff the operator uses blocked evaluation (it buffers data on the native heap up to a threshold and then evaluate that block of data).

See Also:
PipelineOp.Annotations.PIPELINED, PipelineOp.Annotations.MAX_MEMORY

assertAtOnceJavaHeapOp

protected void assertAtOnceJavaHeapOp()
Assert that this operator is annotated as an "at-once" operator which buffers its data on the java heap. The requirements are:
 PIPELINED := false
 MAX_MEMORY := 0
 
When the operator is not pipelined then it is either "blocked" or "at-once". When MAX_MEMORY is ZERO, the operator will buffer its data on the Java heap. All operators which buffer data on the java heap will buffer an unbounded amount of data and are therefore "at-once" rather than "blocked". Operators which buffer their data on the native heap may support either "blocked" and/or "at-once" evaluation, depending on the operators. E.g., a hash join can be either "blocked" or "at-once" while an ORDER-BY is always "at-once".


getMaxParallel

public final int getMaxParallel()
The maximum parallelism with which tasks may be evaluated for this operator (this is a per-shard limit in scale-out). A value of ONE (1) indicates that at most ONE (1) instance of this task may be executing in parallel for a given shard and may be used to indicate that the operator evaluation task is not thread-safe.

See Also:
PipelineOp.Annotations.MAX_PARALLEL

isLastPassRequested

public final boolean isLastPassRequested()
Return true iff a final evaluation pass is requested by the operator. The final evaluation pass will be invoked once for each node or shard on which the operator was evaluated once it has consumed all inputs from upstream operators.

See Also:
PipelineOp.Annotations.LAST_PASS

isSharedState

public final boolean isSharedState()
Return true iff #newStats(IQueryContext) must be shared across all invocations of eval(BOpContext) for this operator for a given query.

See Also:
PipelineOp.Annotations.SHARED_STATE

newStats

public BOpStats newStats()
Return a new object which can be used to collect statistics on the operator evaluation. This may be overridden to return a more specific class depending on the operator.


eval

public abstract FutureTask<Void> eval(BOpContext<IBindingSet> context)
Return a FutureTask which computes the operator against the evaluation context. The caller is responsible for executing the FutureTask (this gives them the ability to hook the completion of the computation).

Parameters:
context - The evaluation context.
Returns:
The FutureTask which will compute the operator's evaluation.
TODO:
Modify to return a Callables for now since we must run each task in its own thread until Java7 gives us fork/join pools and asynchronous file I/O. For the fork/join model we will probably return the ForkJoinTask.


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.