|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.rdf.lexicon.TermIdEncoder
public class TermIdEncoder
An encoder/decoder for long values formed from a partition identifier in the high word and a local counter in the low word where the low N bits of the long value are reversed and rotated into the high N bits of the long value.
The purpose of this encoding is to cause the N high bits to vary rapidly as the local counter is driven up by writes on the index partition. This has the effect of scattering writes on dependent indices (those using the resulting long value as the sole or leading component of their key).
Given a source RDF/XML document with M "terms" distributed uniformly over K TERM2ID index partitions, each term has a uniform likelihood of setting any of the low bits of the local counter. After encoding, this means that the N high-bits of encoded term identifier are uniformly distributed. Assuming that the separator keys for the ID2TERM index divide the key space into equally sized key-ranges, then the reads and writes on the ID2TERM index partitions will be uniformly distributed as well.
The next bits in the encoded values are derived from the partition identifier followed by the term identifier and therefore have a strong bias for the index partition and the sequential assignment of local counter values within an index partition respectively. This means that read / write access within an index partition tends to have some locality, which improves B+Tree performance through several mechanisms (mainly improved cache effects, reduced copy-on-write for dirty leaves and nodes, and less IO costs).
When the #of ID2TERM index partitions GT 2^N, only a subset of
those index partitions can be directly selected by the N high bits with their
uniform distribution. The next bias is the partition identifier, which begins
at ZERO (0), is inflated to (0, [1:P]), where P is the #of index partitions
generated by a scatter split, and grows relatively slowly thereafter as index
partitions are fill up and are split or are moved to redistribute the load on
the cluster.
| Constructor Summary | |
|---|---|
TermIdEncoder(int N)
|
|
| Method Summary | |
|---|---|
static long |
combine(int pid,
int ctr)
Combines the partition identifier and the local counter using the same logic as the BTree.PartitionedCounter. |
long |
decode(long u)
Reverses the effect of encode(long). |
long |
encode(long v)
Encode a term identifier using the configured value of NBits. |
static int |
getLocalCounter(long v)
Return the local counter from the low word of a partitioned counter. |
int |
getNBits()
Return the #of low bits from the local counter that will be reversed and written into the high-bits of the encoded long value. |
static int |
getPartitionId(long v)
Return the partition identifier from the high word of a partitioned counter. |
String |
toString()
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public TermIdEncoder(int N)
N - The #of low bits from the local counter that will be reversed
and written into the high-bits of the encoded long value.| Method Detail |
|---|
public String toString()
toString in class Objectpublic int getNBits()
public long encode(long v)
NBits.
v - A 64-bit long counter value as generated by an
BTree.PartitionedCounter.
public long decode(long u)
encode(long).
u - An encoded long value.
public static int getPartitionId(long v)
v - The partitioned counter.
public static int getLocalCounter(long v)
v - The partitioned counter.
public static long combine(int pid,
int ctr)
BTree.PartitionedCounter.
pid - The partition identifier.ctr - The local counter.
BTree.PartitionedCounter, which performs the same operation and MUST
be consistent with this method.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||