com.bigdata.service.ndx
Class AbstractSplitter

java.lang.Object
  extended by com.bigdata.service.ndx.AbstractSplitter
All Implemented Interfaces:
ISplitter

public abstract class AbstractSplitter
extends Object
implements ISplitter

Basic implementation - you only need to provide resolution for the IMetadataIndex.

Version:
$Id: AbstractSplitter.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson

Field Summary
protected static org.apache.log4j.Logger log
           
 
Constructor Summary
AbstractSplitter()
           
 
Method Summary
protected abstract  IMetadataIndex getMetadataIndex(long ts)
          Return the IMetadataIndex that will be used to compute the Splits
 LinkedList<Split> splitKeys(long ts, int fromIndex, int toIndex, byte[][] keys)
          Identify the Splits for an ordered array of keys such that there is one Split per index partition spanned by the data.
 LinkedList<Split> splitKeys(long ts, int fromIndex, int toIndex, KVO[] a)
          Reshape the data into an unsigned byte[][] and then invoke splitKeys(long, int, int, byte[][]).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static final transient org.apache.log4j.Logger log
Constructor Detail

AbstractSplitter

public AbstractSplitter()
Method Detail

getMetadataIndex

protected abstract IMetadataIndex getMetadataIndex(long ts)
Return the IMetadataIndex that will be used to compute the Splits

Parameters:
ts - The timestamp of the IMetadataIndex view.
Returns:
The IMetadataIndex.

splitKeys

public LinkedList<Split> splitKeys(long ts,
                                   int fromIndex,
                                   int toIndex,
                                   byte[][] keys)
Identify the Splits for an ordered array of keys such that there is one Split per index partition spanned by the data. Find the partition for the first key. Check the last key, if it is in the same partition then then this is the simplest case and we can just send the data along.

Otherwise, perform a binary search on the remaining keys looking for the index of the first key GTE the right separator key for that partition. The batch for this partition is formed from all keys from the first key for that partition up to but excluding the index position identified by the binary search (if there is a match; if there is a miss, then the binary search result needs to be converted into a key index and that will be the last key for the current partition).

Examine the next key and repeat the process until all keys have been allocated to index partitions.

Note: Split points MUST respect the "row" identity for a sparse row store, but we get that constraint by maintaining the index partition boundaries in agreement with the split point constraints for the index.

Note: The splitter always detect keys out of order and will throw an IllegalArgumentException. This is done since it is otherwise too easy for applications to produce unordered data which would then quietly violate this expectation if we relied on asserts.

Specified by:
splitKeys in interface ISplitter
Parameters:
ts - The timestamp for the IMetadataIndex view that will be applied to choose the Splits.
fromIndex - The index of the first key in keys to be processed (inclusive).
toIndex - The index of the last key in keys to be processed.
keys - An array of keys. Each key is an interpreted as an unsigned byte[]. All keys must be non-null. The keys must be in sorted order.
Returns:
The Splits that you can use to form requests based on the identified first/last key and partition identified by this process.
See Also:
Arrays.sort(Object[], int, int, java.util.Comparator), BytesUtil.compareBytes(byte[], byte[])
TODO:
Caching? This procedure performs the minimum #of lookups using IMetadataIndex.find(byte[]) since that operation will be an RMI in a distributed federation. The find(byte[] key) operation is difficult to cache since it locates the index partition that would span the key and many, many different keys could fit into that same index partition. The only effective cache technique may be an LRU that scans ~10 caches locators to see if any of them is a match before reaching out to the remote IMetadataService. Or perhaps the locators can be cached in a local BTree and a miss there would result in a read through to the remote IMetadataService but then we have the problem of figuring out when to release locators if the client is long-lived.

splitKeys

public LinkedList<Split> splitKeys(long ts,
                                   int fromIndex,
                                   int toIndex,
                                   KVO[] a)
Reshape the data into an unsigned byte[][] and then invoke splitKeys(long, int, int, byte[][]).

Specified by:
splitKeys in interface ISplitter
Parameters:
ts - The timestamp for the IMetadataIndex view that will be applied to choose the Splits.
fromIndex - The index of the first key in keys to be processed (inclusive).
toIndex - The index of the last key in keys to be processed.
Returns:
The Splits that you can use to form requests based on the identified first/last key and partition identified by this process.


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.