com.bigdata.bop.rdf.filter
Class NativeDistinctFilter

java.lang.Object
  extended by com.bigdata.bop.CoreBaseBOp
      extended by com.bigdata.bop.BOpBase
          extended by com.bigdata.bop.ap.filter.BOpFilterBase
              extended by com.bigdata.bop.rdf.filter.NativeDistinctFilter
All Implemented Interfaces:
BOp, IFilter, IPropertySet, Serializable, Cloneable

public class NativeDistinctFilter
extends BOpFilterBase

A scalable DISTINCT operator based for SPOs.

Note: While highly scalable, this class will absorb a minimum of one direct buffer per use. This is because we do not have access to the memory manager of the IRunningQuery on which the distinct filter is being run. For this reason, it is allocating a private MemStore and using a finalizer pattern to ensure the eventual release of that MemStore and the backing direct buffers.

Note: This can not be used with pipelined joins because it would allocate one instance per as-bound evaluation of the pipeline join.

Note: You can change the code over the HTree/BTree by modifying only a few lines. See the comments in the file. TODO Reads against the index will eventually degrade since we can not use ordered reads because the iterator filter pattern itself is not vectored. We might be able to fix this with a chunked filter pattern. Otherwise fixing this will require a more significant refactor. TODO It would be nicer if we left the MRU 10k in the map and evicted the LRU 10k each time the map reached 20k. This can not be done with the LinkedHashMap as its API is not sufficient for this purpose. However, similar batch LRU update classes have been defined in the com.bigdata.cache package and could be adapted here for that purpose.

Version:
$Id: DistinctElementFilter.java 3466 2010-08-27 14:28:04Z thompsonbry $
Author:
Bryan Thompson
See Also:
Serialized Form

Nested Class Summary
static interface NativeDistinctFilter.Annotations
           
 
Field Summary
 
Fields inherited from class com.bigdata.bop.CoreBaseBOp
DEFAULT_INITIAL_CAPACITY
 
Fields inherited from interface com.bigdata.bop.BOp
NOANNS, NOARGS
 
Constructor Summary
NativeDistinctFilter(BOp[] args, Map<String,Object> annotations)
          Required shallow copy constructor.
NativeDistinctFilter(NativeDistinctFilter op)
          Required deep copy constructor.
 
Method Summary
protected  Iterator filterOnce(Iterator src, Object context)
          Wrap the source iterator with this filter.
static int[] getFilterKeyOrder(SPOKeyOrder indexKeyOrder)
          Return the 3-component key order which has the best locality given that the SPOs will be ariving in the natural order of the indexKeyOrder.
static NativeDistinctFilter newInstance(SPOKeyOrder indexKeyOrder)
          A instance using the default configuration for the in memory hash map.
 
Methods inherited from class com.bigdata.bop.ap.filter.BOpFilterBase
filter
 
Methods inherited from class com.bigdata.bop.BOpBase
_clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArray
 
Methods inherited from class com.bigdata.bop.CoreBaseBOp
annotationsEqual, annotationsToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, toShortString, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface cutthecrap.utils.striterators.IPropertySet
getProperty
 

Constructor Detail

NativeDistinctFilter

public NativeDistinctFilter(NativeDistinctFilter op)
Required deep copy constructor.


NativeDistinctFilter

public NativeDistinctFilter(BOp[] args,
                            Map<String,Object> annotations)
Required shallow copy constructor.

Method Detail

newInstance

public static NativeDistinctFilter newInstance(SPOKeyOrder indexKeyOrder)
A instance using the default configuration for the in memory hash map.

Parameters:
indexKeyOrder - The natural order in which the ISPOs will arrive at this filter. This is used to decide on the filter key order which will have the best locality given the order of arrival.

filterOnce

protected final Iterator filterOnce(Iterator src,
                                    Object context)
Description copied from class: BOpFilterBase
Wrap the source iterator with this filter.

Specified by:
filterOnce in class BOpFilterBase
Parameters:
src - The source iterator.
context - The iterator evaluation context.
Returns:
The wrapped iterator.

getFilterKeyOrder

public static int[] getFilterKeyOrder(SPOKeyOrder indexKeyOrder)
Return the 3-component key order which has the best locality given that the SPOs will be ariving in the natural order of the indexKeyOrder. This is the keyOrder that we will use for the filter. This gives the filter index structure the best possible locality in terms of the order in which the SPOs are arriving.

The return valuer is an int[3]. The index is the orderinal position of the triples mode key component for the filter keys. The value at that index is the position in the SPOKeyOrder of the quads mode index whose natural order determines the order of arrival of the ISPO objects at this filter.

Thus, given indexKeyOrder = SPOKeyOrder.CSPO, the array:

 int[] = {1,2,3}
 
would correspond to the filter key order SPO, which is the best possible filter key order for the natural order order of the SPOKeyOrder.CSPO index.

Note, however, that key orders can be expressed in this manner which are not defined by SPOKeyOrder. For example, given SPOKeyOrder.PCSO the best filter key order is PSO. While there is no PSO key order declared by the SPOKeyOrder class, we can use

 int[] = {0,2,3}
 
which models the PSO key order for the purposes of this class.

See Also:
Annotations#INDEX_KEY_ORDER


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.