com.bigdata.rdf.lexicon
Class TermIdEncoder

java.lang.Object
  extended by com.bigdata.rdf.lexicon.TermIdEncoder

public class TermIdEncoder
extends Object

An encoder/decoder for long values formed from a partition identifier in the high word and a local counter in the low word where the low N bits of the long value are reversed and rotated into the high N bits of the long value.

The purpose of this encoding is to cause the N high bits to vary rapidly as the local counter is driven up by writes on the index partition. This has the effect of scattering writes on dependent indices (those using the resulting long value as the sole or leading component of their key).

Given a source RDF/XML document with M "terms" distributed uniformly over K TERM2ID index partitions, each term has a uniform likelihood of setting any of the low bits of the local counter. After encoding, this means that the N high-bits of encoded term identifier are uniformly distributed. Assuming that the separator keys for the ID2TERM index divide the key space into equally sized key-ranges, then the reads and writes on the ID2TERM index partitions will be uniformly distributed as well.

The next bits in the encoded values are derived from the partition identifier followed by the term identifier and therefore have a strong bias for the index partition and the sequential assignment of local counter values within an index partition respectively. This means that read / write access within an index partition tends to have some locality, which improves B+Tree performance through several mechanisms (mainly improved cache effects, reduced copy-on-write for dirty leaves and nodes, and less IO costs).

When the #of ID2TERM index partitions GT 2^N, only a subset of those index partitions can be directly selected by the N high bits with their uniform distribution. The next bias is the partition identifier, which begins at ZERO (0), is inflated to (0, [1:P]), where P is the #of index partitions generated by a scatter split, and grows relatively slowly thereafter as index partitions are fill up and are split or are moved to redistribute the load on the cluster.

Version:
$Id: TermIdEncoder.java 4863 2011-07-08 14:48:19Z thompsonbry $
Author:
Bryan Thompson

Constructor Summary
TermIdEncoder(int N)
           
 
Method Summary
static long combine(int pid, int ctr)
          Combines the partition identifier and the local counter using the same logic as the BTree.PartitionedCounter.
 long decode(long u)
          Reverses the effect of encode(long).
 long encode(long v)
          Encode a term identifier using the configured value of NBits.
static int getLocalCounter(long v)
          Return the local counter from the low word of a partitioned counter.
 int getNBits()
          Return the #of low bits from the local counter that will be reversed and written into the high-bits of the encoded long value.
static int getPartitionId(long v)
          Return the partition identifier from the high word of a partitioned counter.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TermIdEncoder

public TermIdEncoder(int N)
Parameters:
N - The #of low bits from the local counter that will be reversed and written into the high-bits of the encoded long value.
Method Detail

toString

public String toString()
Overrides:
toString in class Object

getNBits

public int getNBits()
Return the #of low bits from the local counter that will be reversed and written into the high-bits of the encoded long value.


encode

public long encode(long v)
Encode a term identifier using the configured value of NBits.

Parameters:
v - A 64-bit long counter value as generated by an BTree.PartitionedCounter.
Returns:
A permutation of that long value in which the low N bits have been reversed and rotated into the high N bits.

decode

public long decode(long u)
Reverses the effect of encode(long).

Parameters:
u - An encoded long value.
Returns:
The decode long value.

getPartitionId

public static int getPartitionId(long v)
Return the partition identifier from the high word of a partitioned counter.

Parameters:
v - The partitioned counter.
Returns:
The high word.

getLocalCounter

public static int getLocalCounter(long v)
Return the local counter from the low word of a partitioned counter.

Parameters:
v - The partitioned counter.
Returns:
The low word.

combine

public static long combine(int pid,
                           int ctr)
Combines the partition identifier and the local counter using the same logic as the BTree.PartitionedCounter.

Parameters:
pid - The partition identifier.
ctr - The local counter.
Returns:
The long counter assembled from those values.
See Also:
BTree.PartitionedCounter, which performs the same operation and MUST be consistent with this method.


Copyright © 2006-2011 SYSTAP, LLC. All Rights Reserved.