|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.btree.BytesUtil
public class BytesUtil
Class supporting operations on variable length byte[] keys.
Comparison operations that accept a starting offset are used when the byte[]s are known to share a leading prefix that may be skipped during comparison.
Comparison operations that accept a starting offset and length are used when immutable keys are stored in a single byte[] and an index into starting positions in that byte[] is maintained.
JNI methods are provided for unsigned byte[] comparison. However, note that the JNI methods do not appear to be as fast as the pure Java methods - presumably because of the overhead of going from Java to C. In order to execute using the JNI methods you MUST define the optional boolean system property, e.g.,
java -Dcom.bigdata.btree.BytesUtil.jni=true ...See BytesUtil.c in this package for instructions on compiling the JNI methods. See
main(String[]) which provides a test for the JNI integration and
some pointers on how to get this running on your platform.
| Nested Class Summary | |
|---|---|
static class |
BytesUtil.UnsignedByteArrayComparator
Compares two unsigned byte[]s. |
| Field Summary | |
|---|---|
static byte[] |
EMPTY
An empty byte[]. |
static byte[][] |
EMPTY2
An empty byte[][]. |
static int |
minlen
JNI routines are not invoked unless we will compare byte[]s with at least this many potential bytes to compare (the actual# may be much less of course since comparisons may fail fast). |
| Constructor Summary | |
|---|---|
BytesUtil()
|
|
| Method Summary | |
|---|---|
static int |
altGetBits32(byte[] a,
int off,
int len)
|
static long |
altGetBits64(byte[] a,
int off,
int len)
|
static int |
binarySearch(byte[][] keys,
int base,
int nmem,
byte[] key)
Binary search on an array whose members are variable length unsigned byte[]s. |
static int |
bitFlagByteLength(int nbits)
Return the #of bytes required to bit code the specified #of bits. |
static int |
byteIndexForBit(long bitIndex)
Return the index of the byte in which the bit with the given index is encoded. |
static boolean |
bytesEqual(byte[] a,
byte[] b)
True iff the two arrays compare as equal. |
static int |
compareBytes(byte[] a,
byte[] b)
Byte-wise comparison of byte[]s (the arrays are treated as arrays of unsigned bytes). |
static int |
compareBytesWithLenAndOffset(int aoff,
int alen,
byte[] a,
int boff,
int blen,
byte[] b)
Byte-wise comparison of byte[]s (the arrays are treated as arrays of unsigned bytes). |
static boolean |
getBit(byte[] buf,
long bitIndex)
Get the value of a bit. |
static int |
getBits(byte[] a,
int off,
int len)
Return the n-bit integer corresponding to the inclusive bit range of the byte[]. |
static int |
getBits(int a,
int off,
int len)
Return the n-bit integer corresponding to the inclusive bit range of the byte[]. |
static long |
getBits64(byte[] a,
int off,
int len)
|
static long |
getByteCount(String s)
Decode a string of the form [0-9]+(k|kb|m|mb|g|gb)?,
returning the number of bytes. |
static byte[] |
getPrefix(byte[] a,
byte[] b)
Return a new byte[] containing the leading bytes in common between two byte[]s. |
static int |
getPrefixLength(byte[] a,
byte[] b)
Return the #of leading bytes in common. |
static byte[] |
getSeparatorKey(byte[] givenKey,
byte[] priorKey)
The keys in the nodes of a btree are known as separator keys. |
static boolean |
loadJNILibrary()
Attempt to load the JNI library. |
static void |
main(String[] args)
This method forces the load of the JNI library and tries to execute the JNI methods. |
static int |
maskOffLSB(int h,
int nbits)
Mask off all but the LSB nbits of the hash value. |
static int |
maskOffMSB(int h,
int nbits)
Mask off all but the MSB nbits of the hash value and shift them down such that the masked bits appear at bits (nbits:0] of the returned value. |
static int |
optGetBits(byte[] a,
int off,
int len)
Some benchmarks seem to indicate that altGetBits32 is faster than getBits for smaller byte counts. |
static boolean |
rangeCheck(byte[] key,
byte[] fromKey,
byte[] toKey)
Return true if the key lies inside of the optional
half-open range constraint. |
static boolean |
setBit(byte[] buf,
long bitIndex,
boolean value)
Set the value of a bit - this is NOT thread-safe (contention for the byte in the backing buffer can cause lost updates). |
static byte[] |
successor(byte[] key)
Computes the successor of a variable length byte array by appending a unsigned zero(0) byte to the end of the array. |
static byte[] |
toArray(ByteBuffer b)
Return a byte[] having the data in the ByteBuffer from the
Buffer.position() to the Buffer.limit(). |
static byte[] |
toArray(ByteBuffer b,
boolean forceCopy)
Return a byte[] having the data in the ByteBuffer from the
Buffer.position() to the Buffer.limit(). |
static String |
toBitString(byte[] b)
Return the binary representation of the unsigned byte[]. |
static String |
toString(byte[] key)
Formats a key as a series of comma delimited unsigned bytes. |
static String |
toString(byte[][] data)
Formats the data into a String. |
static String |
toString(byte[] key,
int off,
int len)
Formats a key as a series of comma delimited unsigned bytes. |
static int |
withinByteIndexForBit(long bitIndex)
Return the offset within the byte in which the bit is coded of the bit (this is just the remainder bitIndex % 8). |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final byte[] EMPTY
byte[].
public static final byte[][] EMPTY2
byte[][].
public static final int minlen
| Constructor Detail |
|---|
public BytesUtil()
| Method Detail |
|---|
public static boolean loadJNILibrary()
Note: this is done automatically if the optional boolean system property
com.bigdata.btree.BytesUtil.jni=true is specified, e.g.,
using
java -Dcom.bigdata.btree.BytesUtil.jni=true ...
public static final boolean bytesEqual(byte[] a,
byte[] b)
a - A byte[].b - Another byte[].
null) or if they have the same data.
public static final int compareBytes(byte[] a,
byte[] b)
a - A byte[].b - A byte[].
CustomByteArrayFrontCodedList for an implementation
guideline.
Change all implementations in this class and also BytesUtil.c, which needs to be recompiled for Windows. Also makes sure that it gets compiled and linked for Un*x. That should be tested from the ant installer and the result reported. Do the same for ICU4JNI.
public static final int compareBytesWithLenAndOffset(int aoff,
int alen,
byte[] a,
int boff,
int blen,
byte[] b)
aoff - The offset into a at which the comparison will begin.alen - The #of bytes in a to consider starting at aoff.a - A byte[].boff - The offset into b at which the comparison will begin.blen - The #of bytes in b to consider starting at boff.b - A byte[].
public static final int getPrefixLength(byte[] a,
byte[] b)
a - A variable length unsigned byte array.b - A variable length unsigned byte array.
public static final byte[] getPrefix(byte[] a,
byte[] b)
a - A variable length unsigned byte array[].b - A variable length unsigned byte array[].
public static final byte[] successor(byte[] key)
key - A variable length unsigned byte array.
public static final byte[] getSeparatorKey(byte[] givenKey,
byte[] priorKey)
The keys in the nodes of a btree are known as separator keys. The role of the separator keys is to direct search towards the leaf in which a key exists or would exist by always searching the first child having a separator key that is greater than or equal to the search key.
Separator keys separate leaves and must be choosen with that purpose in mind. The simplest way to choose the separator key is to just take the first key of the leaf - this is always correct. However, shorter separator keys may be choosen by defining the separator key as the shortest key that is less than or equal to the first key of a leaf and greater than the last key of the left sibling of that leaf (that is, the key for the entry that immediately proceeds the first entry on the leaf).
There are several advantages to always choosing the shortest separator key. The original rationale (in "Prefix B-Trees" by Bayer and Unterauer) was to increase the branching factors for fixed size pages. Since we use variable size serialized record, that is not an issue. However, using the shortest separator keys in this implementation provides both smaller serialized records for nodes and faster search since fewer bytes must be tested.
Note that this trick can not be used at higher levels in the btree - separator keys are always formed based on the keys in the leaves and then propagated through the tree.
The rules are simple enough:
givenKey - A key.priorKey - Another key that proceeds the givenKey.
IllegalArgumentException - if either key is null.
IllegalArgumentException - if both keys are the same reference.http://portal.acm.org/citation.cfm?doid=320521.320530public static final String toString(byte[] key)
key - The key.
public static final String toString(byte[] key,
int off,
int len)
key - The key.off - The index of the first byte that will be visited.len - The #of bytes to visit.
public static String toString(byte[][] data)
String.
data - An array of unsigned byte arrays.
public static final int binarySearch(byte[][] keys,
int base,
int nmem,
byte[] key)
keys - The buffer.base - The offset of the base of the array within the buffer.nmem - The #of members in the array. When [nmem == 0], the array is
empty.key - The key for the search.
(-(insertion point) - 1). The insertion
point is defined as the point at which the key would be inserted
into the array of keys. Note that this guarantees that the return
value will be >= 0 if and only if the key is found.
public static final boolean rangeCheck(byte[] key,
byte[] fromKey,
byte[] toKey)
true if the key lies inside of the optional
half-open range constraint.
key - The key.fromKey - The inclusive lower bound -or- null if there is
no lower bound.toKey - The exclusive upper bound -or- null if there is
no upper bound.
true unless the key is LT [fromKey] or GTE
[toKey].public static void main(String[] args)
In order to use the JNI library under Windows, you must specify the JNI library location using the PATH environment variable, e.g.,
cd bigdata set PATH=%PATH%;lib java -cp bin com.bigdata.btree.BytesUtil
In order to use the JNI library under un*x, you must specify the JNI library location
java -Djava.library.path=lib com.bigdata.btree.BytesUtil
args -
UnsatisfiedLinkError - if the JNI methods can not be resolved.
AssertionError - if the JNI methods do not produce the expected answers.public static final int bitFlagByteLength(int nbits)
nbits - The #of bit flags.
public static final int byteIndexForBit(long bitIndex)
bitIndex - The bit index.
public static final int withinByteIndexForBit(long bitIndex)
bitIndex % 8).
Note, the computation of the bit offset is intentionally aligned with
OutputBitStream and InputBitStream.
bitIndex - The bit index into the byte[].
public static final boolean getBit(byte[] buf,
long bitIndex)
Note, the computation of the bit offset is intentionally aligned with
OutputBitStream and InputBitStream.
bitIndex - The index of the bit.
public static final boolean setBit(byte[] buf,
long bitIndex,
boolean value)
Note, the computation of the bit offset is intentionally aligned with
OutputBitStream and InputBitStream.
bitIndex - The index of the bit.
public static int maskOffMSB(int h,
int nbits)
h - The hash value.nbits - The #of bits already accounted for by the path from the root.
public static int maskOffLSB(int h,
int nbits)
h - The hash value.nbits - The #of LSB bits to be retained.
public static int getBits(byte[] a,
int off,
int len)
a.length * 8 - 1. The return value
is an int32 and the bit range must not be greater than 32 bits.
For example, given the following data and the bit range (0,2)
bit index: 01234567 ---------+---------- bit value: 10110000TWO (2) bits starting at bit offset ZERO (0) would be extracted and returned as a 2-bit integer. For those data, the return value would be an int32 value whose binary representation was
10 (with leading
zeros suppressed).
Note: This method is design for use with the unsigned byte[] keys in a bigdata hash index. All keys in bigdata are internally represented as unsigned byte[]s, which is why this method accepts a byte[] rather than an long[] for the bits. Also, while the length of an unsigned byte[] key can vary, they are never huge and an int32 value is sufficient to index into the bits in the byte[]. Finally, the return value is an int because it will be used in hash table designs to index into a hash table based on those bits in a hash code key which are masked as relevant to that hash table. 32bits is far more than we will need to index into a hash table. For an 8k page, we might expect a fan out of at most 1024 which is only 10 bits.
a - A byte[].off - The index of the first bit to be included.len - The number of bits to be returned in [0:32]. However, a bit
length of zero will always return zero.
public static long altGetBits64(byte[] a,
int off,
int len)
public static int altGetBits32(byte[] a,
int off,
int len)
public static long getBits64(byte[] a,
int off,
int len)
public static int optGetBits(byte[] a,
int off,
int len)
public static int getBits(int a,
int off,
int len)
31. The return value is an int32
and the bit range must not be greater than 32 bits.
For example, given the following data and the bit range (0,2)
bit index: 01234567 ---------+---------- bit value: 10110000TWO (2) bits starting at bit offset ZERO (0) would be extracted and returned as a 2-bit integer. For those data, the return value would be an int32 value whose binary representation was
10 (with leading
zeros suppressed).
Note: This method is design for use in a bigdata hash index having native int32 keys rather than unsigned byte[] keys.
a - An integer.off - The index of the first bit to be included.len - The number of bits to be returned in [0:32]. However, a bit
length of zero will always return zero.
public static String toBitString(byte[] b)
b - The unsigned byte[].
IllegalArgumentException - if the argument is null.public static long getByteCount(String s)
[0-9]+(k|kb|m|mb|g|gb)?,
returning the number of bytes. When a suffix indicates kilobytes,
megabytes, or gigabytes then the returned value is scaled accordingly.
The suffix is NOT case sensitive.
s - The string value.
IllegalArgumentException - if there is a problem with the argument (null,
ill-formed, etc).public static byte[] toArray(ByteBuffer b)
ByteBuffer from the
Buffer.position() to the Buffer.limit(). The
position, limit, and mark are not affected by this operation. When the
ByteBuffer has a backing array, the array offset is ZERO (0), and
the Buffer.limit() is equal to the
Buffer.capacity() then the backing array is returned.
Otherwise, a new byte[] is allocated and the data are copied into that
byte[], which is then returned.
b - The ByteBuffer.
public static byte[] toArray(ByteBuffer b,
boolean forceCopy)
ByteBuffer from the
Buffer.position() to the Buffer.limit(). The
position, limit, and mark are not affected by this operation.
Under certain circumstances it is possible and may be desirable to return
the backing ByteBuffer.array(). This behavior is enabled by
forceCopy := false.
It is possible to return the backing byte[] when the ByteBuffer
has a backing array, the array offset is ZERO (0), and the
Buffer.limit() is equal to the Buffer.capacity()
then the backing array is returned. Otherwise, a new byte[] must be
allocated, and the data are copied into that byte[], which may then be
returned.
b - The ByteBuffer.forceCopy - When false, the backing array will be returned if
possible.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||