com.bigdata.btree.keys
Interface IKeyBuilder

All Superinterfaces:
IByteArraySlice, IManagedByteArray, ISortKeyBuilder<Object>
All Known Implementing Classes:
KeyBuilder

public interface IKeyBuilder
extends ISortKeyBuilder<Object>, IManagedByteArray

Interface for building up variable unsigned byte[] keys from one or more primitive data types values and/or Unicode strings. An instance of this interface may be reset() and reused to encode a series of keys.

A sort key is an unsigned byte[] that preserves the total order of the original data. Sort keys may potentially be formed from multiple fields but field markers do not appear within the resulting sort key. While the original values can be extracted from sort keys (this is true of all the fixed length fields, such as int, long, float, or double) they can not be extracted from Unicode variable length fields (the collation ordering for a Unicode string depends on the Locale, the collation strength, and the decomposition mode and is a non-reversable operation).

Unicode

Factory methods are defined by KeyBuilder for obtaining instances of this interface that optionally support Unicode. Instances may be created for a given Locale, collation strength, decomposition mode, etc.

The ICU library supports generation of compressed Unicode sort keys and is used by default when available. The JDK java.text package also supports the generation of Unicode sort keys, but it does NOT produce compressed sort keys. The resulting sort keys are therefore (a) incompatible with those produced by the ICU library and (b) much larger than those produced by the ICU library.

Support for Unicode MAY be disabled using KeyBuilder.Options.COLLATOR, by using KeyBuilder.newInstance() or another factory method that does not enable Unicode support, or by using one of the KeyBuilder constructors that does not support Unicode.

Multi-field keys with variable length fields

Multi-field keys in which variable length fields are embedded within the key present a special problem. Any run of fixed length fields can be compared as unsigned byte[]s. Likewise, any any key with a fixed length prefix (including zero) but a variable length field in its tail can also be compared directly as unsigned byte[]s. However, the introduction of a variable length field into any non-terminal position in a multi-field key must be handled specially since simple concatenation of the field keys will NOT produce the correct total ordering. (This is why SQL requires that text fields compare as if they were padded out with ASCII blanks (0x20) to some maximum length for the field.) A utility method exists specifically for this purpose - see appendText(String, boolean, boolean).

Version:
$Id: IKeyBuilder.java 4548 2011-05-25 19:36:34Z thompsonbry $
Author:
Bryan Thompson
See Also:
KeyBuilder.newInstance(), KeyBuilder.newUnicodeInstance(), KeyBuilder.newUnicodeInstance(Properties), SuccessorUtil

Field Summary
static int maxlen
          The maximum length of a variable length text field is 65535 (pow(2,16)-1).
 
Method Summary
 IKeyBuilder append(BigDecimal d)
          Encode a BigDecimal into an unsigned byte[] and append it into the key buffer.
 IKeyBuilder append(BigInteger i)
          Encode a BigInteger into an unsigned byte[] and append it into the key buffer.
 IKeyBuilder append(byte b)
          Appends a byte - the byte is treated as an unsigned value.
 IKeyBuilder append(byte[] a)
          Appends an array of bytes - the bytes are treated as unsigned values.
 IKeyBuilder append(byte[] a, int off, int len)
          Append len bytes starting at off in a to the key buffer - the bytes are treated as unsigned values.
 IKeyBuilder append(double d)
          Appends a double precision floating point value by first converting it into a signed long integer using Double.doubleToLongBits(double), converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.
 IKeyBuilder append(float f)
          Appends a single precision floating point value by first converting it into a signed integer using Float.floatToIntBits(float) converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.
 IKeyBuilder append(int v)
          Appends a signed integer to the key by first converting it to a lexiographic ordering as an unsigned integer and then appending it into the buffer as 4 bytes using a big-endian order.
 IKeyBuilder append(long v)
          Appends a signed long integer to the key by first converting it to a lexiographic ordering as an unsigned long integer and then appending it into the buffer as 8 bytes using a big-endian order.
 IKeyBuilder append(Object val)
          Append the value to the buffer, encoding it as appropriate based on the class of the object.
 IKeyBuilder append(short v)
          Appends a signed short integer to the key by first converting it to a two-complete representation supporting unsigned byte[] comparison and then appending it into the buffer as 2 bytes using a big-endian order.
 IKeyBuilder append(String s)
          Encodes a Unicode string using the configured KeyBuilder.Options.COLLATOR and appends the resulting sort key to the buffer (without a trailing nul byte).
 IKeyBuilder append(UUID uuid)
          Appends the UUID to the key using the MSB and then the LSB (this preserves the natural order imposed by UUID.compareTo(UUID)).
 IKeyBuilder appendASCII(String s)
          Encodes a unicode string by assuming that its contents are ASCII characters.
 IKeyBuilder appendNul()
          Append an unsigned zero byte to the key.
 IKeyBuilder appendSigned(byte v)
          Converts the signed byte to an unsigned byte and appends it to the key.
 IKeyBuilder appendText(String text, boolean unicode, boolean successor)
          Encodes a variable length text field into the buffer.
 byte[] array()
          The backing byte[] WILL be transparently replaced if the buffer capacity is extended.
 byte[] getKey()
          Return the encoded key.
 boolean isUnicodeSupported()
          Return true iff Unicode is supported by this object (returns false if only ASCII support is configured).
 int len()
          The length of the slice is number of bytes written onto the backing byte[].
 int off()
          The offset of the slice into the backing byte[] is always zero.
 IKeyBuilder reset()
          Reset the key length to zero before building another key.
 byte[] toByteArray()
          An alias for getKey().
 
Methods inherited from interface com.bigdata.btree.keys.ISortKeyBuilder
getSortKey
 
Methods inherited from interface com.bigdata.io.IManagedByteArray
capacity, ensureCapacity, ensureFree
 

Field Detail

maxlen

static final int maxlen
The maximum length of a variable length text field is 65535 (pow(2,16)-1).

Note: This restriction only applies to multi-field keys where the text field appears in a non-terminal position within the key - that is as encoded by . When a text field appears in such a non-terminal position trailing pad characters are used to maintain lexiographic ordering over the multi-field key.

See Also:
Constant Field Values
Method Detail

array

byte[] array()
The backing byte[] WILL be transparently replaced if the buffer capacity is extended. The backing byte[]. This method DOES NOT guarantee that the backing array reference will remain constant. Some implementations use an extensible backing byte[] and will replace the reference when the backing buffer is extended.

Specified by:
array in interface IByteArraySlice

off

int off()
The offset of the slice into the backing byte[] is always zero. The start of the slice in the IByteArraySlice.array().

Specified by:
off in interface IByteArraySlice

len

int len()
The length of the slice is number of bytes written onto the backing byte[]. This is set to ZERO (0) by reset(). The length of the slice in the IByteArraySlice.array(). Note: IByteArraySlice.len() has different semantics for some concrete implementations. ByteArrayBuffer.len() always returns the capacity of the backing byte[] while ByteArrayBuffer.pos() returns the #of bytes written onto the backing buffer. In contrast, KeyBuilder.len() is always the #of bytes written onto the backing buffer.

Specified by:
len in interface IByteArraySlice

getKey

byte[] getKey()
Return the encoded key. Comparison of keys returned by this method MUST treat the array as an array of unsigned bytes.

Note that keys are donated to the btree so it is important to allocate new keys when running in the same process space. When using a network api, the api provides the necessary decoupling.

Returns:
A new array containing the key.
See Also:
BytesUtil.compareBytes(byte[], byte[])

toByteArray

byte[] toByteArray()
An alias for getKey(). Return a copy of the data in the slice.

Specified by:
toByteArray in interface IByteArraySlice
Returns:
A new array containing data in the slice.

reset

IKeyBuilder reset()
Reset the key length to zero before building another key.

Specified by:
reset in interface IManagedByteArray
Returns:
this

append

IKeyBuilder append(String s)
Encodes a Unicode string using the configured KeyBuilder.Options.COLLATOR and appends the resulting sort key to the buffer (without a trailing nul byte).

Note: The SuccessorUtil.successor(String) of a string is formed by appending a trailing nul character. However, since IDENTICAL appears to be required to differentiate between a string and its successor (with the trailing nul character), you MUST form the sort key first and then its successor (by appending a trailing nul). Failure to follow this pattern will lead to the successor of the key comparing as EQUAL to the key. For example,

            
            IKeyBuilder keyBuilder = ...;
            
            String s = "foo";
            
            byte[] fromKey = keyBuilder.reset().append( s );
            
            // right.
            byte[] toKey = keyBuilder.reset().append( s ).appendNul();
            
            // wrong!
            byte[] toKey = keyBuilder.reset().append( s+"\0" );
            
 

Parameters:
s - A string.
Returns:
this
Throws:
UnsupportedOperationException - if Unicode is not supported.
See Also:
SuccessorUtil.successor(String), SuccessorUtil.successor(byte[]), FIXME update the javadoc further to speak to handling of multi-field keys.
TODO:
provide a more flexible interface for handling Unicode, including the means to encode using a specified language family (such as could be identified with an xml:lang attribute).

appendText

IKeyBuilder appendText(String text,
                       boolean unicode,
                       boolean successor)
Encodes a variable length text field into the buffer. The text is truncated to maxlen characters. The sort keys for strings that differ after truncation solely in the #of trailing #pad characters will be identical (trailing pad characters are implicit out to maxlen characters).

Note: Trailing pad characters are normalized to a representation as a single pad character (1 byte) followed by the #of actual or implied trailing pad characters represented as an unsigned short integer (2 bytes). This technique serves to keep multi-field keys with embedded variable length text fields aligned such that the field following a variable length text field does not bleed into the lexiographic ordering of the variable length text field.

Note: While the ASCII encoding happens to use one byte for each character that is NOT true of the Unicode encoding. The space requirements for the Unicode encoding depend on the text, the Locale, the collator strength, and the collator decomposition mode.

Note: The successor option is designed to encapsulate some trickiness around forming the successor of a variable length text field embedded in a multi-field key. In particular, simply appending a nul byte will NOT work (it works fine when the text field is the last field in the key or when it is the only component in the key). This approach breaks encapsulation of the field boundaries such that the resulting "successor" is actually ordered before the original key. This happens because you introduce a 0x0 byte right on the boundary of the next field, effectively causing the next field to have a smaller value. Consider the following example (in hex) where "|" represents the end of the "text" field:

     ab cd | 12
 
if you compute the successor by appending a nul byte to the text field you get
     ab cd | 00 12
 
which is ordered before the original key!

Parameters:
text - The text.
unicode - When true the text is interpreted as Unicode according to the KeyBuilder.Options.COLLATOR option. Otherwise it is interpreted as ASCII.
successor - When true, the successor of the text will be encoded. Otherwise the text will be encoded.
Returns:
The IKeyBuilder.
See Also:
http://www.unicode.org/reports/tr10/tr10-10.html#Interleaved_Levels

isUnicodeSupported

boolean isUnicodeSupported()
Return true iff Unicode is supported by this object (returns false if only ASCII support is configured).


appendASCII

IKeyBuilder appendASCII(String s)
Encodes a unicode string by assuming that its contents are ASCII characters. For each character, this method simply chops of the high byte and converts the low byte to an unsigned byte.

Note: This method is potentially much faster than the Unicode aware append(String). However, this method is NOT unicode aware and non-ASCII characters will not be encoded correctly. This method MUST NOT be mixed with keys whose corresponding component is encoded by the unicode aware methods, e.g., append(String).

Parameters:
s - A String containing US-ASCII characters.
Returns:
this

append

IKeyBuilder append(byte b)
Appends a byte - the byte is treated as an unsigned value.

Specified by:
append in interface IManagedByteArray
Parameters:
b - The byte.
Returns:
this

append

IKeyBuilder append(byte[] a)
Appends an array of bytes - the bytes are treated as unsigned values.

Specified by:
append in interface IManagedByteArray
Parameters:
a - The array of bytes.
Returns:
this

append

IKeyBuilder append(byte[] a,
                   int off,
                   int len)
Append len bytes starting at off in a to the key buffer - the bytes are treated as unsigned values.

Specified by:
append in interface IManagedByteArray
Parameters:
off - The offset.
len - The #of bytes to append.
a - The array containing the bytes to append.
Returns:
this

append

IKeyBuilder append(double d)
Appends a double precision floating point value by first converting it into a signed long integer using Double.doubleToLongBits(double), converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.

Note: this converts -0d and +0d to the same key.

Parameters:
d - The double-precision floating point value.
Returns:
this

append

IKeyBuilder append(float f)
Appends a single precision floating point value by first converting it into a signed integer using Float.floatToIntBits(float) converting that values into a twos-complement number and then appending the bytes in big-endian order into the key buffer.

Note: this converts -0f and +0f to the same key.

Parameters:
f - The single-precision floating point value.
Returns:
this

append

IKeyBuilder append(UUID uuid)
Appends the UUID to the key using the MSB and then the LSB (this preserves the natural order imposed by UUID.compareTo(UUID)).

Parameters:
uuid - The UUID.
Returns:
this

append

IKeyBuilder append(long v)
Appends a signed long integer to the key by first converting it to a lexiographic ordering as an unsigned long integer and then appending it into the buffer as 8 bytes using a big-endian order.

Returns:
this

append

IKeyBuilder append(int v)
Appends a signed integer to the key by first converting it to a lexiographic ordering as an unsigned integer and then appending it into the buffer as 4 bytes using a big-endian order.

Returns:
this

append

IKeyBuilder append(short v)
Appends a signed short integer to the key by first converting it to a two-complete representation supporting unsigned byte[] comparison and then appending it into the buffer as 2 bytes using a big-endian order.

Returns:
this

appendSigned

IKeyBuilder appendSigned(byte v)
Converts the signed byte to an unsigned byte and appends it to the key.

Parameters:
v - The signed byte.
Returns:
this

appendNul

IKeyBuilder appendNul()
Append an unsigned zero byte to the key.

Returns:
this

append

IKeyBuilder append(BigInteger i)
Encode a BigInteger into an unsigned byte[] and append it into the key buffer.

The encoding is a 2 byte run length whose leading bit is set iff the BigInteger is negative followed by the byte[] as returned by BigInteger.toByteArray().

Parameters:
The - BigInteger value.
Returns:
The unsigned byte[].

append

IKeyBuilder append(BigDecimal d)
Encode a BigDecimal into an unsigned byte[] and append it into the key buffer.

Parameters:
The - BigDecimal value.
Returns:
The unsigned byte[].

append

IKeyBuilder append(Object val)
Append the value to the buffer, encoding it as appropriate based on the class of the object. This method handles all of the primitive data types plus UUID and Unicode Strings.

Parameters:
val - The value.
Returns:
this
Throws:
IllegalArgumentException - if val is null.
UnsupportedOperationException - if val is an instance of an unsupported class.


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.