com.bigdata.sparse
Class KeyDecoder

java.lang.Object
  extended by com.bigdata.sparse.KeyDecoder

public class KeyDecoder
extends Object

A utility class that decodes a key in a SparseRowStore into the KeyType for the primary key, the column name, and the timestamp. Note that the exact schema name itself is not recoverable since it is encoded using a non-reversible algorithm (it is a sort key generated by a Unicode collator). Likewise, the primary key can be decoded for primitive data types, but while we can identify the bytes corresponding to the primary key for a Unicode KeyType we can not decode them (it is also a sort key generated by a Unicode collator). The column name is NOT stored with Unicode compression so that we can decode it without loss (it is encoded into bytes using UTF-8 and those bytes are written directly into the key). This means that column names are NOT ordered according to the Unicode collator. In practice this is not a problem since we never assume order for that part of the key. The SparseRowStore only relies on {columnName,timestamp} defining the semantics of distinct keys for a given {schema,primaryKey} prefix.

The encoded schema name is followed by the KeyType.getByteCode() and then by a nul byte. By searching for the nul byte we can identify the end of the encoded schema name and also the data type of the primary key. Most kinds of primary keys have a fixed length encoding, e.g., Long, Double, etc.

Unicode primary keys have a variable length encoding which makes life more complex. For Unicode primary keys, we break with the collation order and use the UTF8 encoding of the key. This means that the primary key can be decoded and preserves hierarchical namespace clustering within the row store but does not impose a total sort order per Unicode sort key semantics. The only reasonable approach is to append a byte sequence to the key that never occurs within the generated Unicode sort keys. Again, we use a nul byte to mark the end of the Unicode primary key since it is not emitted by most Unicode collation implementations as it would cause grief for C-language strings. (However, see SparseRowStore.Options#PRIMARY_KEY_UNICODE_CLEAN} for information on backward compatibility.)

Version:
$Id: KeyDecoder.java 3408 2010-08-04 18:53:35Z thompsonbry $
Author:
Bryan Thompson
See Also:
Schema.fromKey(IKeyBuilder, Object), KeyType.getKeyType(byte), AtomicRowWriteRead, AtomicRowRead
TODO:
The key is now 100% decodable. The package should be updated to take advantage of that.

Field Summary
 long timestamp
          The decoded timestamp on the column value.
 
Constructor Summary
KeyDecoder(byte[] key)
           
 
Method Summary
 String getColumnName()
          The decoded column name.
 byte[] getPrefix()
          Returns the head of the key corresponding to the encoded schema name, the primary key's KeyType, and the primary key (including any terminating nul byte).
 int getPrefixLength()
          Returns the length of the prefix corresponding to the encoded schema name, the primary key's KeyType, and the primary key (including any terminating nul byte).
 Object getPrimaryKey()
          The decoded primary key.
 KeyType getPrimaryKeyType()
          The decoded KeyType for the primary key.
 byte[] getSchemaBytes()
          The bytes from the key that represent the encoded name of the Schema.
 String getSchemaName()
          Return the schema name.
 long getTimestamp()
          The decoded timestamp on the column value.
 String toString()
          Shows some of the data that is extracted.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

timestamp

public final long timestamp
The decoded timestamp on the column value.

Constructor Detail

KeyDecoder

public KeyDecoder(byte[] key)
Method Detail

getSchemaBytes

public byte[] getSchemaBytes()
The bytes from the key that represent the encoded name of the Schema.


getSchemaName

public String getSchemaName()
Return the schema name.

Throws:
UnsupportedOperationException - unless SparseRowStore.schemaNameUnicodeClean is true.

getPrimaryKeyType

public final KeyType getPrimaryKeyType()
The decoded KeyType for the primary key.


getPrimaryKey

public Object getPrimaryKey()
The decoded primary key.

Throws:
UnsupportedOperationException - if the primary key can not be decoded.

getColumnName

public final String getColumnName()
The decoded column name.


getTimestamp

public long getTimestamp()
The decoded timestamp on the column value. The semantics of the timestamp depend entirely on the application. When the application provides timestamps, they are application defined long integers. When the application requests auto-timestamps, they are generated by the data service.


getPrefix

public byte[] getPrefix()
Returns the head of the key corresponding to the encoded schema name, the primary key's KeyType, and the primary key (including any terminating nul byte).

Returns:

getPrefixLength

public int getPrefixLength()
Returns the length of the prefix corresponding to the encoded schema name, the primary key's KeyType, and the primary key (including any terminating nul byte).

Returns:

toString

public String toString()
Shows some of the data that is extracted.

Overrides:
toString in class Object


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.