com.bigdata.btree.filter
Class PrefixFilter<E>

java.lang.Object
  extended by com.bigdata.btree.filter.PrefixFilter<E>
All Implemented Interfaces:
ITupleFilter<E>, cutthecrap.utils.striterators.IFilter, Serializable

public class PrefixFilter<E>
extends Object
implements ITupleFilter<E>

Filter visits all ITuples whose keys begin with any of the specified prefix(s). The filer accepts a key or an array of keys that define the key prefix(s) whose completions will be visited. It efficiently forms the successor of each key prefix, performs a key-range scan of the key prefix, and (if more than one key prefix is given), seeks to the start of the next key-range scan.

WARNING

The prefix keys MUST be formed with StrengthEnum.Identical. This is necessary in order to match all keys in the index since it causes the secondary characteristics to NOT be included in the prefix key even if they are present in the keys in the index. Using other StrengthEnums will result in secondary characteristics being encoded by additional bytes appended to the key. This will result in scan matching ONLY the given prefix key(s) and matching nothing if those prefix keys are not actually present in the index.

For example, the Unicode text "Bryan" is encoded as the unsigned byte[]

 [43, 75, 89, 41, 67]
 

at PRIMARY strength but as the unsigned byte[]

 [43, 75, 89, 41, 67, 1, 9, 1, 143, 8]
 

at IDENTICAL strength. The additional bytes for the IDENTICAL strength reflect the Locale specific Unicode sort key encoding of secondary characteristics such as case. The successor of the PRIMARY strength byte[] is

 [43, 75, 89, 41, 68]
 

(one was added to the last byte) which spans all keys of interest. However the successor of the IDENTICAL strength byte[] would

 [43, 75, 89, 41, 67, 1, 9, 1, 143, 9]
 

and would ONLY span the single tuple whose key was "Bryan".

You can form an appropriate IKeyBuilder for the prefix keys using

 Properties properties = new Properties();
 
 properties.setProperty(KeyBuilder.Options.STRENGTH, StrengthEnum.Primary
         .toString());
 
 prefixKeyBuilder = KeyBuilder.newUnicodeInstance(properties);
 

Note: It is NOT trivial to define filter that may be used to accept only keys that extend the prefix on a caller-defined boundary (e.g., corresponding to the encoding of a whitespace or word break). There are two issues: (1) the keys are encoded so the filter needs to recognize the byte(s) in the Unicode sort key that correspond to, e.g., the work boundary. (2) the keys may have been encoded with secondary characteristics, in which case the boundary will not begin immediately after the prefix.

Version:
$Id: PrefixFilter.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
See Also:
TestPrefixFilter, Serialized Form
TODO:
Only pass the relevant elements of keyPrefix to any given index partition. It is possible that an element spans the end of an index partition, in which case the scan must resume with the next partition. There is no real way to know this without testing the next partition....

Field Summary
protected static boolean INFO
           
protected static org.apache.log4j.Logger log
           
 
Constructor Summary
PrefixFilter(byte[] keyPrefix)
          Completion scan with a single prefix.
PrefixFilter(byte[][] keyPrefix)
          Completion scan with an array of key prefixes.
 
Method Summary
 ITupleIterator<E> filter(Iterator src)
          Strengthened return type.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static final transient org.apache.log4j.Logger log

INFO

protected static final transient boolean INFO
Constructor Detail

PrefixFilter

public PrefixFilter(byte[] keyPrefix)
Completion scan with a single prefix. The iterator will visit all tuples having the given key prefix.

Parameters:
keyPrefix - An unsigned byte[] containing a key prefix.

PrefixFilter

public PrefixFilter(byte[][] keyPrefix)
Completion scan with an array of key prefixes. The iterator will visit all tuples having the first key prefix, then all tuples having the next key prefix, etc. until all key prefixes have been evaluated.

Parameters:
keyPrefix - An array of unsigned byte prefixes (the elements of the array MUST be presented in sorted order and nulls are not permitted).
Method Detail

filter

public ITupleIterator<E> filter(Iterator src)
Description copied from interface: ITupleFilter
Strengthened return type.

Specified by:
filter in interface ITupleFilter<E>
Specified by:
filter in interface cutthecrap.utils.striterators.IFilter


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.