com.bigdata.btree.keys
Class KeyBuilder

java.lang.Object
  extended by com.bigdata.btree.keys.KeyBuilder
All Implemented Interfaces:
IKeyBuilder, ISortKeyBuilder

public class KeyBuilder
extends Object
implements IKeyBuilder

A class that may be used to form multi-component keys but which does not support Unicode. An instance of this class is quite light-weight and SHOULD be used when Unicode support is not required.

Note: Avoid any dependencies within this class on the ICU libraries so that the code may run without those libraries when they are not required.

Version:
$Id: KeyBuilder.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
See Also:
Compute the successor of a value before encoding it as a component of a key., Compute the successor of an encoded key.
TODO:
introduce a mark and restore feature for generating multiple keys that share some leading prefix. in general, this is as easy as resetting the len field to the mark. keys with multiple components could benefit from allowing multiple marks (the sparse row store is the main use case)., Integrate support for ICU versioning into the client and perhaps into the index metadata so clients can discover which version and configuration properties to use when generating keys for an index.

Nested Class Summary
static interface KeyBuilder.Options
          Configuration options for DefaultKeyBuilderFactory and the KeyBuilder factory methods.
 
Field Summary
protected  byte[] buf
          The key buffer.
static int DEFAULT_INITIAL_CAPACITY
          The default capacity of the key buffer.
protected static boolean INFO
           
protected  int len
          A non-negative integer specifying the #of bytes of data in the buffer that contain valid data starting from position zero(0).
protected static org.apache.log4j.Logger log
           
 byte pad
          The default pad character (a space).
protected  UnicodeSortKeyGenerator sortKeyGenerator
          The object used to generate sort keys from Unicode strings (optional).
 
Fields inherited from interface com.bigdata.btree.keys.IKeyBuilder
maxlen
 
Constructor Summary
  KeyBuilder()
          Creates a key builder with an initial buffer capacity of 1024 bytes.
  KeyBuilder(int initialCapacity)
          Creates a key builder with the specified initial buffer capacity.
protected KeyBuilder(UnicodeSortKeyGenerator sortKeyGenerator, int len, byte[] buf)
          Creates a key builder using an existing buffer with some data (designated constructor).
 
Method Summary
 IKeyBuilder append(byte v)
          Converts the signed byte to an unsigned byte and appends it to the key.
 IKeyBuilder append(byte[] a)
          Appends an array of bytes - the bytes are treated as unsigned values.
 IKeyBuilder append(double d)
          Appends a double precision floating point value by first converting it into a signed long integer using Double.doubleToLongBits(double), converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.
 IKeyBuilder append(float f)
          Appends a single precision floating point value by first converting it into a signed integer using Float.floatToIntBits(float) converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.
 IKeyBuilder append(int v)
          Appends a signed integer to the key by first converting it to a lexiographic ordering as an unsigned integer and then appending it into the buffer as 4 bytes using a big-endian order.
 IKeyBuilder append(int off, int len, byte[] a)
          Append len bytes starting at off in a to the key buffer.
 IKeyBuilder append(long v)
          Appends a signed long integer to the key by first converting it to a lexiographic ordering as an unsigned long integer and then appending it into the buffer as 8 bytes using a big-endian order.
 IKeyBuilder append(Object val)
          Append the value to the buffer, encoding it as appropriate based on the class of the object.
 IKeyBuilder append(short v)
          Appends a signed short integer to the key by first converting it to a two-complete representation supporting unsigned byte[] comparison and then appending it into the buffer as 2 bytes using a big-endian order.
 IKeyBuilder append(String s)
          Encodes a Unicode string using the configured KeyBuilder.Options.COLLATOR and appends the resulting sort key to the buffer (without a trailing nul byte).
 IKeyBuilder append(UUID uuid)
          Appends the UUID to the key using the MSB and then the LSB.
 IKeyBuilder appendASCII(String s)
          Encodes a unicode string by assuming that its contents are ASCII characters.
 IKeyBuilder appendNul()
          Append an unsigned zero byte to the key.
 IKeyBuilder appendText(String text, boolean unicode, boolean successor)
          Encodes a variable length text field into the buffer.
 IKeyBuilder appendUnsigned(byte v)
           
static byte[] asSortKey(Object val)
          Utility method converts an application key to a sort key (an unsigned byte[] that imposes the same sort order).
protected static byte[] createBuffer(int initialCapacity)
          Create a buffer of the specified initial capacity.
static long d2l(double d)
          Encodes a double precision floating point value as an int64 value that has the same total ordering (you can compare two doubles encoded by this method and the long values will have the same ordering as the double values).
static String decodeASCII(byte[] key, int off, int len)
          Decodes an ASCII string from a key.
static byte decodeByte(int v)
          Converts an unsigned byte into a signed byte.
static double decodeDouble(byte[] key, int off)
           
static float decodeFloat(byte[] key, int off)
           
static int decodeInt(byte[] buf, int off)
          Decodes a signed int value as encoded by append(int).
static long decodeLong(byte[] buf, int off)
          Decodes a signed long value as encoded by append(long).
static short decodeShort(byte[] buf, int off)
          Decodes a signed short value as encoded by append(short).
static byte encodeByte(int v)
          Converts a signed byte into an unsigned byte.
 void ensureCapacity(int capacity)
          Ensure that the buffer capacity is a least capacity total bytes.
 void ensureFree(int len)
          Ensure that at least len bytes are free in the buffer.
static int f2i(float f)
          Encodes a floating point value as an int32 value that has the same total ordering (you can compare two floats encoded by this method and the int values will have the same ordering as the float values).
 byte[] getBuffer()
           
 byte[] getKey()
          Return the encoded key.
 int getLength()
          The #of bytes of data in the key.
 byte[] getSortKey(Object val)
          Return an unsigned byte[] sort key.
 UnicodeSortKeyGenerator getSortKeyGenerator()
          The object responsible for generating sort keys from Unicode strings.
 boolean isUnicodeSupported()
          Return true iff Unicode is supported by this object (returns false if only ASCII support is configured).
static IKeyBuilder newInstance()
           
static IKeyBuilder newInstance(int initialCapacity)
          Create an instance for ASCII keys with the specified initial capacity.
static IKeyBuilder newInstance(int capacity, CollatorEnum collatorChoice, Locale locale, Object strength, DecompositionEnum mode)
          Create a new instance that optionally supports Unicode sort keys.
static IKeyBuilder newUnicodeInstance()
          Create a factory for IKeyBuilder instances configured using the system properties.
static IKeyBuilder newUnicodeInstance(Properties properties)
          Create a factory for IKeyBuilder instances configured according to the specified properties.
 void position(int pos)
          Sets the position to any non-negative length less than the current capacity of the buffer.
 IKeyBuilder reset()
          Reset the key length to zero before building another key.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static final org.apache.log4j.Logger log

INFO

protected static final boolean INFO

DEFAULT_INITIAL_CAPACITY

public static final int DEFAULT_INITIAL_CAPACITY
The default capacity of the key buffer.

See Also:
Constant Field Values

len

protected int len
A non-negative integer specifying the #of bytes of data in the buffer that contain valid data starting from position zero(0).


buf

protected byte[] buf
The key buffer. This is re-allocated whenever the capacity of the buffer is too small and reused otherwise.


sortKeyGenerator

protected final UnicodeSortKeyGenerator sortKeyGenerator
The object used to generate sort keys from Unicode strings (optional).

Note: When null the IKeyBuilder does NOT support Unicode and the optional Unicode methods will all throw an UnsupportedOperationException.


pad

public final byte pad
The default pad character (a space).

Note: Any character may be choosen as the pad character as long as it has a one byte representation. In practice this means you can choose 0x20 (a space) or 0x00 (a nul). This limit arises in appendText(String, boolean, boolean) which assumes that it can write a pad character (or its successor) in one byte. 0xff will NOT work since its successor is not defined within an bit string of length 8.

See Also:
Constant Field Values
TODO:
make this a configuration option? if so then verify that the choice (and its successor) fit in 8 bits.
Constructor Detail

KeyBuilder

public KeyBuilder()
Creates a key builder with an initial buffer capacity of 1024 bytes.


KeyBuilder

public KeyBuilder(int initialCapacity)
Creates a key builder with the specified initial buffer capacity.

Parameters:
initialCapacity - The initial capacity of the internal byte[] used to construct keys. When zero (0) the DEFAULT_INITIAL_CAPACITY will be used.

KeyBuilder

protected KeyBuilder(UnicodeSortKeyGenerator sortKeyGenerator,
                     int len,
                     byte[] buf)
Creates a key builder using an existing buffer with some data (designated constructor).

Parameters:
sortKeyGenerator - The object used to generate sort keys from Unicode strings (when null Unicode collation support is disabled).
len - The #of bytes of data in the provided buffer.
buf - The buffer, with len pre-existing bytes of valid data. The buffer reference is used directly rather than making a copy of the data.
Method Detail

createBuffer

protected static byte[] createBuffer(int initialCapacity)
Create a buffer of the specified initial capacity.

Parameters:
initialCapacity - The initial size of the buffer.
Returns:
The byte[] buffer.
Throws:
IllegalArgumentException - if the initial capacity is negative.

getLength

public final int getLength()
Description copied from interface: IKeyBuilder
The #of bytes of data in the key.

Specified by:
getLength in interface IKeyBuilder

getBuffer

public final byte[] getBuffer()

position

public final void position(int pos)
Sets the position to any non-negative length less than the current capacity of the buffer.


append

public final IKeyBuilder append(int off,
                                int len,
                                byte[] a)
Description copied from interface: IKeyBuilder
Append len bytes starting at off in a to the key buffer.

Specified by:
append in interface IKeyBuilder
Parameters:
off - The offset.
len - The #of bytes to append.
a - The array containing the bytes to append.
Returns:
this

ensureFree

public final void ensureFree(int len)
Ensure that at least len bytes are free in the buffer. The buffer may be grown by this operation but it will not be truncated.

This operation is equivilent to

 ensureCapacity(this.len + len)
 
and the latter is often used as an optimization.

Parameters:
len - The minimum #of free bytes.

ensureCapacity

public final void ensureCapacity(int capacity)
Ensure that the buffer capacity is a least capacity total bytes. The buffer may be grown by this operation but it will not be truncated.

Parameters:
capacity - The minimum #of bytes in the buffer.

getKey

public final byte[] getKey()
Description copied from interface: IKeyBuilder
Return the encoded key. Comparison of keys returned by this method MUST treat the array as an array of unsigned bytes.

Note that keys are donated to the btree so it is important to allocate new keys when running in the same process space. When using a network api, the api provides the necessary decoupling.

Specified by:
getKey in interface IKeyBuilder
Returns:
A new array containing the key.
See Also:
BytesUtil.compareBytes(byte[], byte[])

reset

public final IKeyBuilder reset()
Description copied from interface: IKeyBuilder
Reset the key length to zero before building another key.

Specified by:
reset in interface IKeyBuilder
Returns:
this

isUnicodeSupported

public final boolean isUnicodeSupported()
Description copied from interface: IKeyBuilder
Return true iff Unicode is supported by this object (returns false if only ASCII support is configured).

Specified by:
isUnicodeSupported in interface IKeyBuilder

getSortKeyGenerator

public final UnicodeSortKeyGenerator getSortKeyGenerator()
The object responsible for generating sort keys from Unicode strings. The UnicodeSortKeyGenerator -or- null if Unicode is not supported by this KeyBuilder instance.


append

public final IKeyBuilder append(String s)
Description copied from interface: IKeyBuilder
Encodes a Unicode string using the configured KeyBuilder.Options.COLLATOR and appends the resulting sort key to the buffer (without a trailing nul byte).

Note: The SuccessorUtil.successor(String) of a string is formed by appending a trailing nul character. However, since IDENTICAL appears to be required to differentiate between a string and its successor (with the trailing nul character), you MUST form the sort key first and then its successor (by appending a trailing nul). Failure to follow this pattern will lead to the successor of the key comparing as EQUAL to the key. For example,

            
            IKeyBuilder keyBuilder = ...;
            
            String s = "foo";
            
            byte[] fromKey = keyBuilder.reset().append( s );
            
            // right.
            byte[] toKey = keyBuilder.reset().append( s ).appendNul();
            
            // wrong!
            byte[] toKey = keyBuilder.reset().append( s+"\0" );
            
 

Specified by:
append in interface IKeyBuilder
Parameters:
s - A string.
Returns:
this
See Also:
SuccessorUtil.successor(String), SuccessorUtil.successor(byte[]), FIXME update the javadoc further to speak to handling of multi-field keys.

appendASCII

public IKeyBuilder appendASCII(String s)
Description copied from interface: IKeyBuilder
Encodes a unicode string by assuming that its contents are ASCII characters. For each character, this method simply chops of the high byte and converts the low byte to an unsigned byte.

Note: This method is potentially much faster than the Unicode aware IKeyBuilder.append(String). However, this method is NOT uncode aware and non-ASCII characters will not be encoded correctly. This method MUST NOT be mixed with keys whose corresponding component is encoded by the unicode aware methods, e.g., IKeyBuilder.append(String).

Specified by:
appendASCII in interface IKeyBuilder
Parameters:
s - A String containing US-ASCII characters.
Returns:
this

decodeASCII

public static String decodeASCII(byte[] key,
                                 int off,
                                 int len)
Decodes an ASCII string from a key.

Parameters:
key - The key.
off - The offset of the start of the string.
len - The #of bytes to decode (one byte per character).
Returns:
The ASCII characters decoded from the key.
See Also:
appendASCII(String)

appendText

public IKeyBuilder appendText(String text,
                              boolean unicode,
                              boolean successor)
Description copied from interface: IKeyBuilder
Encodes a variable length text field into the buffer. The text is truncated to IKeyBuilder.maxlen characters. The sort keys for strings that differ after truncation solely in the #of trailing #pad characters will be identical (trailing pad characters are implicit out to IKeyBuilder.maxlen characters).

Note: Trailing pad characters are normalized to a representation as a single pad character (1 byte) followed by the #of actual or implied trailing pad characters represented as an unsigned short integer (2 bytes). This technique serves to keep multi-field keys with embedded variable length text fields aligned such that the field following a variable length text field does not bleed into the lexiographic ordering of the variable length text field.

Note: While the ASCII encoding happens to use one byte for each character that is NOT true of the Unicode encoding. The space requirements for the Unicode encoding depend on the text, the Local, the collator strength, and the collator decomposition mode.

Note: The successor option is designed to encapsulate some trickiness around forming the successor of a variable length text field embedded in a multi-field key. In particular, simply appending a nul byte will NOT work (it works fine when the text field is the last field in the key or when it is the only component in the key). This approach breaks encapsulation of the field boundaries such that the resulting "successor" is actually ordered before the original key. This happens because you introduce a 0x0 byte right on the boundary of the next field, effectively causing the next field to have a smaller value. Consider the following example (in hex) where "|" represents the end of the "text" field:

     ab cd | 12
 
if you compute the successor by appending a nul byte to the text field you get
     ab cd | 00 12
 
which is ordered before the original key!

Specified by:
appendText in interface IKeyBuilder
Parameters:
text - The text.
unicode - When true the text is interpreted as Unicode according to the KeyBuilder.Options.COLLATOR option. Otherwise it is interpreted as ASCII.
successor - When true, the successor of the text will be encoded. Otherwise the text will be encoded.
Returns:
The IKeyBuilder.
See Also:
http://www.unicode.org/reports/tr10/tr10-10.html#Interleaved_Levels

append

public final IKeyBuilder append(byte[] a)
Description copied from interface: IKeyBuilder
Appends an array of bytes - the bytes are treated as unsigned values.

Specified by:
append in interface IKeyBuilder
Parameters:
a - The array of bytes.
Returns:
this

append

public final IKeyBuilder append(double d)
Description copied from interface: IKeyBuilder
Appends a double precision floating point value by first converting it into a signed long integer using Double.doubleToLongBits(double), converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.

Note: this converts -0d and +0d to the same key.

Specified by:
append in interface IKeyBuilder
Parameters:
d - The double-precision floating point value.
Returns:
this

decodeDouble

public static double decodeDouble(byte[] key,
                                  int off)

append

public final IKeyBuilder append(float f)
Description copied from interface: IKeyBuilder
Appends a single precision floating point value by first converting it into a signed integer using Float.floatToIntBits(float) converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.

Note: this converts -0f and +0f to the same key.

Specified by:
append in interface IKeyBuilder
Parameters:
f - The single-precision floating point value.
Returns:
this

decodeFloat

public static float decodeFloat(byte[] key,
                                int off)

append

public final IKeyBuilder append(UUID uuid)
Description copied from interface: IKeyBuilder
Appends the UUID to the key using the MSB and then the LSB.

Specified by:
append in interface IKeyBuilder
Parameters:
uuid - The UUID.
Returns:
this

append

public final IKeyBuilder append(long v)
Description copied from interface: IKeyBuilder
Appends a signed long integer to the key by first converting it to a lexiographic ordering as an unsigned long integer and then appending it into the buffer as 8 bytes using a big-endian order.

Specified by:
append in interface IKeyBuilder
Returns:
this

append

public final IKeyBuilder append(int v)
Description copied from interface: IKeyBuilder
Appends a signed integer to the key by first converting it to a lexiographic ordering as an unsigned integer and then appending it into the buffer as 4 bytes using a big-endian order.

Specified by:
append in interface IKeyBuilder
Returns:
this

append

public final IKeyBuilder append(short v)
Description copied from interface: IKeyBuilder
Appends a signed short integer to the key by first converting it to a two-complete representation supporting unsigned byte[] comparison and then appending it into the buffer as 2 bytes using a big-endian order.

Specified by:
append in interface IKeyBuilder
Returns:
this

appendUnsigned

public final IKeyBuilder appendUnsigned(byte v)

append

public final IKeyBuilder append(byte v)
Description copied from interface: IKeyBuilder
Converts the signed byte to an unsigned byte and appends it to the key.

Specified by:
append in interface IKeyBuilder
Parameters:
v - The signed byte.
Returns:
this

appendNul

public final IKeyBuilder appendNul()
Description copied from interface: IKeyBuilder
Append an unsigned zero byte to the key.

Specified by:
appendNul in interface IKeyBuilder
Returns:
this

asSortKey

public static final byte[] asSortKey(Object val)
Utility method converts an application key to a sort key (an unsigned byte[] that imposes the same sort order).

Note: This method is thread-safe.

Note: Strings are Unicode safe for the default locale. See Locale.getDefault(). If you require a specific local or different locals at different times or for different indices then you MUST provision and apply your own KeyBuilder.

Parameters:
val - An application key.
Returns:
The unsigned byte[] equivilent of that key. This will be null iff the key is null. If the key is a byte[], then the byte[] itself will be returned.

getSortKey

public byte[] getSortKey(Object val)
Description copied from interface: ISortKeyBuilder
Return an unsigned byte[] sort key.

Specified by:
getSortKey in interface ISortKeyBuilder
Parameters:
val - Some object (required).
Returns:
The unsigned byte[] sort key.

append

public IKeyBuilder append(Object val)
Description copied from interface: IKeyBuilder
Append the value to the buffer, encoding it as appropriate based on the class of the object. This method handles all of the primitive data types plus UUID and Unicode Strings.

Specified by:
append in interface IKeyBuilder
Parameters:
val - The value.
Returns:
this

encodeByte

public static byte encodeByte(int v)
Converts a signed byte into an unsigned byte.

Parameters:
v - The signed byte.
Returns:
The corresponding unsigned value.

decodeByte

public static byte decodeByte(int v)
Converts an unsigned byte into a signed byte.

Parameters:
v - The unsigned byte.
Returns:
The corresponding signed value.

d2l

public static long d2l(double d)
Encodes a double precision floating point value as an int64 value that has the same total ordering (you can compare two doubles encoded by this method and the long values will have the same ordering as the double values). The method works by converting the double to the IEEE 754 floating-point "double format" bit layout using Double.doubleToLongBits(double) and then converting the resulting long into a two's complement number. See Comparing floating point numbers by Bruce Dawson.

Parameters:
d - The double precision floating point value.
Returns:
The corresponding long integer value that maintains the same total ordering.

f2i

public static int f2i(float f)
Encodes a floating point value as an int32 value that has the same total ordering (you can compare two floats encoded by this method and the int values will have the same ordering as the float values). The method works by converting the float to the IEEE 754 floating-point "single format" bit layout using Float.floatToIntBits(float) and then converting the resulting int into a two's complement number. See Comparing floating point numbers by Bruce Dawson.

Parameters:
f - The floating point value.
Returns:
The corresponding integer value that maintains the same total ordering.

decodeLong

public static long decodeLong(byte[] buf,
                              int off)
Decodes a signed long value as encoded by append(long).

Parameters:
buf - The buffer containing the encoded key.
off - The offset at which to decode the key.
Returns:
The signed long value.

decodeInt

public static int decodeInt(byte[] buf,
                            int off)
Decodes a signed int value as encoded by append(int).

Parameters:
buf - The buffer containing the encoded key.
off - The offset at which to decode the key.
Returns:
The signed int value.

decodeShort

public static short decodeShort(byte[] buf,
                                int off)
Decodes a signed short value as encoded by append(short).

Parameters:
buf - The buffer containing the encoded key.
off - The offset at which to decode the key.
Returns:
The signed short value.

newInstance

public static IKeyBuilder newInstance()

newInstance

public static IKeyBuilder newInstance(int initialCapacity)
Create an instance for ASCII keys with the specified initial capacity.

Parameters:
initialCapacity - The initial capacity.
Returns:
The new instance.

newUnicodeInstance

public static IKeyBuilder newUnicodeInstance()
Create a factory for IKeyBuilder instances configured using the system properties. The factory will support Unicode unless CollatorEnum.ASCII is explicitly specified for the KeyBuilder.Options.COLLATOR property.

Parameters:
properties - The properties to be used (optional). When null the System properties are used.
Throws:
UnsupportedOperationException -

The ICU library was required but was not located. Make sure that the ICU JAR is on the classpath. See KeyBuilder.Options.COLLATOR.

Note: If you are trying to use ICU4JNI then that has to be locatable as a native library. How you do this is different for Windows and Un*x.

See Also:
KeyBuilder.Options

newUnicodeInstance

public static IKeyBuilder newUnicodeInstance(Properties properties)
Create a factory for IKeyBuilder instances configured according to the specified properties. Any properties NOT explicitly given will be defaulted from System.getProperties(). The pre-defined properties KeyBuilder.Options.USER_LANGUAGE, KeyBuilder.Options.USER_COUNTRY, and KeyBuilder.Options.USER_VARIANT MAY be overriden. The factory will support Unicode unless CollatorEnum.ASCII is explicitly specified for the KeyBuilder.Options.COLLATOR property.

Parameters:
properties - The properties to be used (optional). When null the System properties are used.
Throws:
UnsupportedOperationException -

The ICU library was required but was not located. Make sure that the ICU JAR is on the classpath. See KeyBuilder.Options.COLLATOR.

Note: If you are trying to use ICU4JNI then that has to be locatable as a native library. How you do this is different for Windows and Un*x.

See Also:
KeyBuilder.Options

newInstance

public static IKeyBuilder newInstance(int capacity,
                                      CollatorEnum collatorChoice,
                                      Locale locale,
                                      Object strength,
                                      DecompositionEnum mode)
Create a new instance that optionally supports Unicode sort keys.

Parameters:
capacity - The initial capacity of the buffer. When zero (0) the DEFAULT_INITIAL_CAPACITY will be used.
collatorChoice - Identifies the collator that will be used to generate sort keys from Unicode values.
locale - When null the default locale will be used.
strength - Either an Integer or a StrengthEnum specifying the strength to be set on the collator object (optional). When null the default strength of the collator will not be overridden.
mode - The decomposition mode to be set on the collator object (optional). When null the default decomposition mode of the collator will not be overridden.
Returns:
The new instance.
Throws:
UnsupportedOperationException -

The ICU library was required but was not located. Make sure that the ICU JAR is on the classpath.

Note: If you are trying to use ICUJNI then that has to be locatable as a native library. How you do this is different for Windows and Un*x.



Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.