com.bigdata.bop.join
Class PipelineJoinStats

java.lang.Object
  extended by com.bigdata.bop.engine.BOpStats
      extended by com.bigdata.bop.join.BaseJoinStats
          extended by com.bigdata.bop.join.PipelineJoinStats
All Implemented Interfaces:
Serializable

public class PipelineJoinStats
extends BaseJoinStats

Extended statistics for the join operator.

See Also:
Serialized Form

Field Summary
 CAT inputSolutions
          The #of input solutions consumed (not just accepted).
 CAT outputSolutions
          The #of output solutions generated.
 
Fields inherited from class com.bigdata.bop.join.BaseJoinStats
accessPathChunksIn, accessPathCount, accessPathDups, accessPathRangeCount, accessPathUnitsIn
 
Fields inherited from class com.bigdata.bop.engine.BOpStats
chunksIn, chunksOut, elapsed, mutationCount, opCount, typeErrors, unitsIn, unitsOut
 
Constructor Summary
PipelineJoinStats()
           
 
Method Summary
 void add(BOpStats o)
          Combine the statistics (addition), but do NOT add to self.
 double getJoinHitRatio()
          The estimated join hit ratio.
protected  void toString(StringBuilder sb)
          Extension hook for BOpStats.toString().
 
Methods inherited from class com.bigdata.bop.engine.BOpStats
toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

inputSolutions

public final CAT inputSolutions
The #of input solutions consumed (not just accepted).

Note: This counter is highly correlated with BOpStats.unitsIn but is incremented only when we begin evaluation of the IAccessPath associated with a specific input solution.

When PipelineJoin.Annotations.COALESCE_DUPLICATE_ACCESS_PATHS is true, multiple input binding sets can be mapped onto the same IAccessPath and this counter will be incremented by the #of such input binding sets.


outputSolutions

public final CAT outputSolutions
The #of output solutions generated. This is incremented as soon as the solution is produced and is used by getJoinHitRatio(). Of necessity, updates to inputSolutions slightly lead updates to inputSolutions.

Note: This counter is highly correlated with BOpStats.unitsOut.

Constructor Detail

PipelineJoinStats

public PipelineJoinStats()
Method Detail

getJoinHitRatio

public double getJoinHitRatio()
The estimated join hit ratio. This is computed as
 outputSolutions / inputSolutions
 
It is ZERO (0) when inputSolutions is ZERO (0).

The join hit ratio is always accurate when the join is fully executed. However, when a cutoff join is used to estimate the join hit ratio a measurement error can be introduced into the join hit ratio unless PipelineJoin.Annotations.COALESCE_DUPLICATE_ACCESS_PATHS is false, PipelineOp.Annotations.MAX_PARALLEL is GT ONE (1), or PipelineJoin.Annotations.MAX_PARALLEL_CHUNKS is GT ZERO (0).

When access paths are coalesced because there is an inner loop over the input solutions mapped onto the same access path. This inner loop the causes inputSolutions to be incremented by the #of coalesced access paths before any outputSolutions are counted. Coalescing access paths therefore can cause the join hit ratio to be underestimated as there may appear to be more input solutions consumed than were actually applied to produce output solutions if the join was cutoff while processing a set of input solutions which were identified as using the same as-bound access path.

The worst case can introduce substantial error into the estimated join hit ratio. Consider a cutoff of 100. If one input solution generates 100 output solutions and two input solutions are mapped onto the same access path, then the input count will be 2 and the output count will be 100, which gives a reported join hit ration of 100/2 when the actual join hit ratio is 100/1.

A similar problem can occur if PipelineOp.Annotations.MAX_PARALLEL or PipelineJoin.Annotations.MAX_PARALLEL_CHUNKS is GT ONE (1) since input count can be incremented by the #of threads before any output solutions are generated. Estimation error can also occur if multiple join tasks are run in parallel for different chunks of input solutions.


add

public void add(BOpStats o)
Description copied from class: BOpStats
Combine the statistics (addition), but do NOT add to self.

Overrides:
add in class BaseJoinStats
Parameters:
o - Another statistics object.

toString

protected void toString(StringBuilder sb)
Description copied from class: BOpStats
Extension hook for BOpStats.toString().

Overrides:
toString in class BaseJoinStats
Parameters:
sb - Where to write the additional state.


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.