com.bigdata.sparse
Class SparseRowStore

java.lang.Object
  extended by com.bigdata.sparse.SparseRowStore
All Implemented Interfaces:
IRowStoreConstants

public class SparseRowStore
extends Object
implements IRowStoreConstants

A client-side class that knows how to use an IIndex to provide an efficient data model in which a logical row is stored as one or more entries in the IIndex. Operations are provided for atomic read and write of logical row. While the scan operations are always consistent (they will never reveal data from a row that undergoing concurrent modification), they do NOT cause concurrent atomic row writes to block. This means that rows that would be visited by a scan MAY be modified before the scan reaches those rows and the client will see the updates.

The SparseRowStore requires that you declare the KeyType for primary key so that it may impose a consistent total ordering over the generated keys in the index.

There is no intrinsic reason why column values must be strongly typed. Therefore, by default column values are loosely typed. However, column values MAY be constrained by a Schema.

This class builds keys using the sparse row store design pattern. Each logical row is modeled as an ordered set of index entries whose keys are formed as:

                                             
 [schemaName][primaryKey][columnName][timestamp]
                                             
 

and the values are the value for a given column for that primary key.

Timestamps are either generated by the application, in which case they define the semantics of a write-write conflict, or on write by the index. In the latter case, write-write conflicts never arise. Regardless of how timestamps are generated, the use of the timestamp in the key requires that applications specify filters that are applied during row scans to limit the data points actually returned as part of the row. For example, only returning the most recent column values no later than a given timestamp for all columns for some primary key.

For example, assuming records with the following columns

would be represented as a series of index entries as follows:

                                             
 [employee][12][DateOfHire][t0] : [4/30/02]
 [employee][12][DateOfHire][t1] : [4/30/05]
 [employee][12][Employer][t0]   : [SAIC]
 [employee][12][Employer][t1]   : [SYSTAP]
 [employee][12][Id][t0]         : [12]
 [employee][12][Name][t0]       : [Bryan Thompson]
                                             
 

In order to read the logical row whose last update was t0, the caller would specify t0 as the toTime of interest. The values read in this example would be {<DateOfHire, t0, 4/30/02>, <Employer, t0, SAIC>, <Id, t0, 12>, <Name, t0, Bryan Thompson>}.

Likewise, in order to read the logical row whose last update was <code>t1</code> the caller would specify <code>t1</code> as the toTime of interest. The values read in this example would be {<DateOfHire, t1, 4/30/05>, <Employer, t0, SYSTAP>, <Id, t0, 12>, <Name, t0, Bryan Thompson>}. Notice that values written at <code>t0</code> and not overwritten or deleted by <code>t1</code> are present in the resulting logical row.

Note: Very large objects should be stored in the BigdataFileSystem (distributed, atomic, versioned, chunked file system) and the identifier for that object can then be stored in the row store.

Version:
$Id: SparseRowStore.java 2265 2009-10-26 12:51:06Z thompsonbry $ FIXME write a REST service using Json to interchange data with the SparseRowStore. A caching layer in the web app could be used to reduce any hotspots., $Id: SparseRowStore.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson, Bryan Thompson

Field Summary
protected  boolean DEBUG
          True iff the log level is DEBUG or less.
protected  boolean INFO
          True iff the log level is INFO or less.
protected static org.apache.log4j.Logger log
           
 
Fields inherited from interface com.bigdata.sparse.IRowStoreConstants
AUTO_TIMESTAMP, AUTO_TIMESTAMP_UNIQUE, CURRENT_ROW, MAX_TIMESTAMP, MIN_TIMESTAMP
 
Constructor Summary
SparseRowStore(IIndex ndx)
          Create a client-side abstraction that treats an IIndex as a SparseRowStore.
 
Method Summary
 ITPS delete(Schema schema, Object primaryKey)
          Atomic delete of all property values for the current logical row.
 ITPS delete(Schema schema, Object primaryKey, long fromTime, long toTime, long writeTime, INameFilter filter)
          Atomic delete of all property values for the logical row.
 Object get(Schema schema, Object primaryKey, String name)
          Return the current binding for the named property.
 IIndex getIndex()
          The backing index.
 Iterator<? extends ITPS> rangeIterator(Schema schema)
          A logical row scan.
 Iterator<? extends ITPS> rangeIterator(Schema schema, Object fromKey, Object toKey)
          A logical row scan.
 Iterator<? extends ITPS> rangeIterator(Schema schema, Object fromKey, Object toKey, INameFilter filter)
          A logical row scan.
 Iterator<? extends ITPS> rangeIterator(Schema schema, Object fromKey, Object toKey, int capacity, long fromTime, long toTime, INameFilter nameFilter)
          A logical row scan.
 Map<String,Object> read(Schema schema, Object primaryKey)
          Read the most recent logical row from the index.
 Map<String,Object> read(Schema schema, Object primaryKey, INameFilter filter)
          Read the most recent logical row from the index.
 ITPS read(Schema schema, Object primaryKey, long fromTime, long toTime, INameFilter filter)
          Read a logical row from the index.
 Map<String,Object> write(Schema schema, Map<String,Object> propertySet)
          Atomic write with atomic read-back of the post-update state of the logical row.
 Map<String,Object> write(Schema schema, Map<String,Object> propertySet, long writeTime)
          Atomic write with atomic read-back of the post-update state of the logical row.
 TPS write(Schema schema, Map<String,Object> propertySet, long writeTime, INameFilter filter, IPrecondition precondition)
          Atomic write with atomic read of the then current post-condition state of the logical row.
 TPS write(Schema schema, Map<String,Object> propertySet, long fromTime, long toTime, long writeTime, INameFilter filter, IPrecondition precondition)
          Atomic write with atomic read of the post-condition state of the logical row.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static final org.apache.log4j.Logger log

INFO

protected final boolean INFO
True iff the log level is INFO or less.


DEBUG

protected final boolean DEBUG
True iff the log level is DEBUG or less.

Constructor Detail

SparseRowStore

public SparseRowStore(IIndex ndx)
Create a client-side abstraction that treats an IIndex as a SparseRowStore.

Parameters:
ndx - The index.
Method Detail

getIndex

public IIndex getIndex()
The backing index.


get

public Object get(Schema schema,
                  Object primaryKey,
                  String name)
Return the current binding for the named property.

Parameters:
schema - The Schema governing the logical row.
primaryKey - The primary key that identifies the logical row.
name - The property name.
Returns:
The current binding -or- null iff the property is not bound.
TODO:
this can be optimized and should use its own stored procedure. See AbstractAtomicRowReadOrWrite.getCurrentValue(IIndex, Schema, Object, String)

read

public Map<String,Object> read(Schema schema,
                               Object primaryKey)
Read the most recent logical row from the index.

Parameters:
schema - The Schema governing the logical row.
primaryKey - The primary key that identifies the logical row.
Returns:
The data for the current state of that logical row -or- null IFF there are no property values for that logical row (including no deleted property values, no property values that are excluded due to their timestamps, and no property values that are excluded due to a property name filter). A null return is a strong guarentee that NO data existed in the row store and that time of the read for the given schema and primaryKey.

read

public Map<String,Object> read(Schema schema,
                               Object primaryKey,
                               INameFilter filter)
Read the most recent logical row from the index.

Parameters:
schema - The Schema governing the logical row.
primaryKey - The primary key that identifies the logical row.
filter - An optional filter.
Returns:
The data for the current state of that logical row -or- null IFF there are no property values for that logical row (including no deleted property values, no property values that are excluded due to their timestamps, and no property values that are excluded due to a property name filter). A null return is a strong guarentee that NO data existed in the row store and that time of the read for the given schema and primaryKey.

read

public ITPS read(Schema schema,
                 Object primaryKey,
                 long fromTime,
                 long toTime,
                 INameFilter filter)
Read a logical row from the index.

Parameters:
schema - The Schema governing the logical row.
primaryKey - The primary key that identifies the logical row.
fromTime - The first timestamp for which timestamped property values will be accepted.
toTime - The first timestamp for which timestamped property values will NOT be accepted -or- IRowStoreConstants.CURRENT_ROW to accept only the most current binding whose timestamp is GTE fromTime.
filter - An optional filter that may be used to select values for property names accepted by the filter.
Returns:
The data in that row -or- null IFF there are no property values for that logical row (including no deleted property values, no property values that are excluded due to their timestamps, and no property values that are excluded due to a property name filter). A null return is a strong guarentee that NO data existed in the row store and that time of the read for the given schema and primaryKey.
Throws:
IllegalArgumentException - if the schema is null.
IllegalArgumentException - if the primaryKey is null.
IllegalArgumentException - if the fromFrom and or toTime are invalid.
See Also:
ITimestampPropertySet#asMap(), return the most current bindings., ITimestampPropertySet#asMap(long)), return the most current bindings as of the specified timestamp., IRowStoreConstants.CURRENT_ROW, IRowStoreConstants.MIN_TIMESTAMP, IRowStoreConstants.MAX_TIMESTAMP

write

public Map<String,Object> write(Schema schema,
                                Map<String,Object> propertySet)
Atomic write with atomic read-back of the post-update state of the logical row.

Note: In order to cause a column value for row to be deleted you MUST specify a null column value for that column.

Note: the value of the primaryKey is written each time the logical row is updated and timestamp associate with the value for the primaryKey property tells you the timestamp of each row revision.

Parameters:
schema - The Schema governing the logical row.
propertySet - The column names and values for that row.
Returns:
The result of an atomic read on the post-update state of the logical row. Only the most current bindings will be present for each property.

write

public Map<String,Object> write(Schema schema,
                                Map<String,Object> propertySet,
                                long writeTime)
Atomic write with atomic read-back of the post-update state of the logical row.

Parameters:
schema - The Schema governing the logical row.
propertySet - The column names and values for that row.
writeTime - The timestamp to use for the row -or- IRowStoreConstants.AUTO_TIMESTAMP if the timestamp will be generated by the server -or- IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE if a federation-wide unique timestamp will be generated by the server.
Returns:
The result of an atomic read on the post-update state of the logical row. Only the most current bindings will be present for each property.

write

public TPS write(Schema schema,
                 Map<String,Object> propertySet,
                 long writeTime,
                 INameFilter filter,
                 IPrecondition precondition)
Atomic write with atomic read of the then current post-condition state of the logical row.

Note: In order to cause a column value for row to be deleted you MUST specify a null column value for that column. A null will be written under the key for the column value with a new timestamp. This is interpreted as a deleted property value when the row is simplified as a Map. If you examine the ITPS you can see the ITPV with the null value and the timestamp of the delete.

Note: the value of the primaryKey is written each time the logical row is updated and timestamp associate with the value for the primaryKey property tells you the timestamp of each row revision.

Note: If the caller specified a timestamp, then that timestamp is used by the atomic read. If the timestamp was assigned by the server, then the server assigned timestamp is used by the atomic read.

Note: You can verify pre-conditions for the logical row on the server. Among other things this could be used to reject an update if someone has modified the logical row since you last read some value.

Parameters:
schema - The Schema governing the logical row.
propertySet - The column names and values for that row. The primaryKey as identified by the Schema MUST be present in the propertySet.
writeTime - The timestamp to use for the row -or- IRowStoreConstants.AUTO_TIMESTAMP if the timestamp will be generated by the server -or- IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE if a federation-wide unique timestamp will be generated by the server.
filter - An optional filter used to select the property values that will be returned (this has no effect on the atomic write).
precondition - When present, the pre-condition state of the row will be read and offered to the IPrecondition. If the IPrecondition fails, then the atomic write will NOT be performed and the pre-condition state of the row will be returned. If the IPrecondition succeeds, then the atomic write will be performed and the post-condition state of the row will be returned. Use TPS.isPreconditionOk() to determine whether or not the write was performed.
Returns:
The result of an atomic read on the post-update state of the logical row -or- null iff there is no data for the primaryKey (per the contract for an atomic read).

If an optional IPrecondition was specified and the IPrecondition was NOT satisified, then the write operation was NOT performed and the result is the pre-condition state of the logical row (which, again, will be null IFF there is NO data for the primaryKey).

See Also:
ITPS.getWriteTimestamp()

write

public TPS write(Schema schema,
                 Map<String,Object> propertySet,
                 long fromTime,
                 long toTime,
                 long writeTime,
                 INameFilter filter,
                 IPrecondition precondition)
Atomic write with atomic read of the post-condition state of the logical row.

Note: In order to cause a column value for row to be deleted you MUST specify a null column value for that column. A null will be written under the key for the column value with a new timestamp. This is interpreted as a deleted property value when the row is simplified as a Map. If you examine the ITPS you can see the ITPV with the null value and the timestamp of the delete.

Note: the value of the primaryKey is written each time the logical row is updated and timestamp associate with the value for the primaryKey property tells you the timestamp of each row revision.

Note: If the caller specified a timestamp, then that timestamp is used by the atomic read. If the timestamp was assigned by the server, then the server assigned timestamp is used by the atomic read.

Note: You can verify pre-conditions for the logical row on the server. Among other things this could be used to reject an update if someone has modified the logical row since you last read some value.

Parameters:
schema - The Schema governing the logical row.
propertySet - The column names and values for that row. The primaryKey as identified by the Schema MUST be present in the propertySet.
fromTime - During pre-condition and post-condition reads, the first timestamp for which timestamped property values will be accepted.
toTime - During pre-condition and post-condition reads, the first timestamp for which timestamped property values will NOT be accepted -or- IRowStoreConstants.CURRENT_ROW to accept only the most current binding whose timestamp is GTE fromTime.
writeTime - The timestamp to use for the row -or- IRowStoreConstants.AUTO_TIMESTAMP if the timestamp will be generated by the server -or- IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE if a federation-wide unique timestamp will be generated by the server.
filter - An optional filter used to select the property values that will be returned (this has no effect on the atomic write).
precondition - When present, the pre-condition state of the row will be read and offered to the IPrecondition. If the IPrecondition fails, then the atomic write will NOT be performed and the pre-condition state of the row will be returned. If the IPrecondition succeeds, then the atomic write will be performed and the post-condition state of the row will be returned. Use TPS.isPreconditionOk() to determine whether or not the write was performed.
Returns:
The result of an atomic read on the post-update state of the logical row, which will be null IFF there is NO data for the primaryKey.

If an optional IPrecondition was specified and the IPrecondition was NOT satisified, then the write operation was NOT performed and the result is the pre-condition state of the logical row (which, again, will be null IFF there is NO data for the primaryKey).

Throws:
UnsupportedOperationException - if a property has an auto-increment type and the ValueType of the property does not support auto-increment.
UnsupportedOperationException - if a property has an auto-increment type but there is no successor in the value space of that property.
See Also:
ITPS.getWriteTimestamp()
TODO:
the atomic read back may be overkill. When you need the data is means that you only do one RPC rather than two. When you do not need the data it is just more network traffic and more complexity in this method signature. You can get pretty much the same result by doing an atomic read after the fact using the timestamp assigned by the server to the row (pretty much in the sense that it is possible for another write to explicitly specify the same timestamp and hence overwrite your data)., the timestamp could be an ITimestampService with an implementation that always returns a caller-given constant, another that uses the local system clock, another that uses the system clock but ensures that it never hands off the same timestamp twice in a row, and another than resolves the global timestamp service.

it is also possible that the timestamp behavior should be defined by the Schema and therefore factored out of this method signature.


delete

public ITPS delete(Schema schema,
                   Object primaryKey)
Atomic delete of all property values for the current logical row.

Parameters:
schema - The schema.
primaryKey - The primary key for the logical row.
Returns:
The deleted property values.

delete

public ITPS delete(Schema schema,
                   Object primaryKey,
                   long fromTime,
                   long toTime,
                   long writeTime,
                   INameFilter filter)
Atomic delete of all property values for the logical row. The property values are read atomically, each property value that is read is then overwritten with a null, and the read property values are returned.

Parameters:
schema - The schema.
primaryKey - The primary key for the logical row.
fromTime - During pre-condition and post-condition reads, the first timestamp for which timestamped property values will be accepted.
toTime - During pre-condition and post-condition reads, the first timestamp for which timestamped property values will NOT be accepted -or- IRowStoreConstants.CURRENT_ROW to accept only the most current binding whose timestamp is GTE fromTime.
writeTime - The timestamp that will be written into the "deleted" entries -or- IRowStoreConstants.AUTO_TIMESTAMP if the timestamp will be generated by the server -or- IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE if a federation-wide unique timestamp will be generated by the server.
filter - An optional filter used to select the property values that will be deleted.
Returns:
The property values that were read from the store before they were deleted. The ITPS.getWriteTimestamp() will report the timestamp assigned to the deleted entries used to overwrite these property values in the store.
TODO:
add optional IPrecondition., unit tests.

rangeIterator

public Iterator<? extends ITPS> rangeIterator(Schema schema)
A logical row scan. Each logical row will be read atomically. Only the current bindings for property values will be returned.

Parameters:
schema - The Schema governing the logical row.
Returns:
An iterator visiting each logical row in the specified key range.

rangeIterator

public Iterator<? extends ITPS> rangeIterator(Schema schema,
                                              Object fromKey,
                                              Object toKey)
A logical row scan. Each logical row will be read atomically. Only the current bindings for property values will be returned.

Parameters:
schema - The Schema governing the logical row.
fromKey - The value of the primary key for lower bound (inclusive) of the key range -or- null iff there is no lower bound.
toKey - The value of the primary key for upper bound (exclusive) of the key range -or- null iff there is no lower bound.
Returns:
An iterator visiting each logical row in the specified key range.

rangeIterator

public Iterator<? extends ITPS> rangeIterator(Schema schema,
                                              Object fromKey,
                                              Object toKey,
                                              INameFilter filter)
A logical row scan. Each logical row will be read atomically. Only the current bindings for property values will be returned.

Parameters:
schema - The Schema governing the logical row.
fromKey - The value of the primary key for lower bound (inclusive) of the key range -or- null iff there is no lower bound.
toKey - The value of the primary key for upper bound (exclusive) of the key range -or- null iff there is no lower bound.
filter - An optional filter.
Returns:
An iterator visiting each logical row in the specified key range.

rangeIterator

public Iterator<? extends ITPS> rangeIterator(Schema schema,
                                              Object fromKey,
                                              Object toKey,
                                              int capacity,
                                              long fromTime,
                                              long toTime,
                                              INameFilter nameFilter)
A logical row scan. Each logical row will be read atomically.

Parameters:
schema - The Schema governing the logical row.
fromKey - The value of the primary key for lower bound (inclusive) of the key range -or- null iff there is no lower bound.
toKey - The value of the primary key for upper bound (exclusive) of the key range -or- null iff there is no lower bound.
capacity - When non-zero, this is the maximum #of logical rows that will be read atomically. This is only an upper bound. The actual #of logical rows in an atomic read depends on a variety of factors.
fromTime - The first timestamp for which timestamped property values will be accepted.
toTime - The first timestamp for which timestamped property values will NOT be accepted -or- IRowStoreConstants.CURRENT_ROW to accept only the most current binding whose timestamp is GTE fromTime.
nameFilter - An optional filter used to select the property(s) of interest.
Returns:
An iterator visiting each logical row in the specified key range.


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.