com.bigdata.btree
Class UnisolatedReadWriteIndex

java.lang.Object
  extended by com.bigdata.btree.UnisolatedReadWriteIndex
All Implemented Interfaces:
IAutoboxBTree, IIndex, IIndexLocalCounter, IRangeQuery, ISimpleBTree, ICounterSetAccess

public class UnisolatedReadWriteIndex
extends Object
implements IIndex

A view onto an unisolated index partition which enforces the constraint that either concurrent writers -or- a single writer may have access to the unisolated index at any given time. This provides the maximum possible concurrency for an unisolated index using an internal ReadWriteLock to coordinate threads.

The possible concurrency with this approach is higher than that provided by the IConcurrencyManager since the latter only allows a single process access to the unisolated index while this class can allow multiple readers concurrent access to the same unisolated index. The use of this class is NOT compatible with the IConcurrencyManager (the IConcurrencyManager does not respect the locks managed by this class).

This class does NOT handle deadlock detection. However, it does not expose the underlying lock and the scope of the acquired lock should always be restricted to a single operation as defined by IIndex. If you circumvent this by writing and submitting an IIndexProcedure that attempts an operation on another UnisolatedReadWriteIndex then a deadlock MAY occur.

The point test methods on this class (get, contains, lookup, remove) have relatively high overhead since they need to acquire and release the lock per point test. If you need to do a bunch of point tests, then submit an IIndexProcedure that will run against the underlying index once it has acquired the appropriate lock -- point tests from within the IIndexProcedure will be very efficient.

Design notes

This class was developed to squeeze the maximum possible performance out of a local database. The use of this class can provide correct interleaving of readers and writers without the use of the ConcurrencyManager and the group commit protocol which it imposes on writers. It also facilitates the reuse of the buffers backing the unisolated index, which can reduce IO associated with index operations when compared to reading on a read-committed view of the index with concurrent writes and interleaved commits on the corresponding unisolated index.

Reading on the read-committed index view has greater possible concurrency, but requires that writes are committed before they become visible and must read the data from the disk since it does not have access to the buffers for the unisolated index. Requiring a commit in order for the writes to become visible imposes significant latency, especially when computing the fix point of a rule set which may take multiple rounds. Reading on the unisolated index should do better in terms of buffer reuse and does NOT require commits or checkpoints of the index for writes to become visible to readers but does require a means to correctly interleave access to the unisolated index, which is the purpose of this class.

While the lock manager could be modified to support Share vs Exclusive locks and to use Share locks for readers and Exclusive locks for writers, writers would still block until the next commit so the throughput (e.g., when computing the fix point of a rule set) is significantly lower.

Version:
$Id: UnisolatedReadWriteIndex.java 4054 2011-01-05 13:51:25Z thompsonbry $
Author:
Bryan Thompson

Field Summary
protected static long LOCK_TIMEOUT_MILLIS
          The #of milliseconds that the class will wait for a read or write lock.
 
Fields inherited from interface com.bigdata.btree.IRangeQuery
ALL, CURSOR, DEFAULT, DELETED, FIXED_LENGTH_SUCCESSOR, KEYS, NONE, PARALLEL, READONLY, REMOVEALL, REVERSE, VALS
 
Constructor Summary
UnisolatedReadWriteIndex(BTree ndx)
          Creates a view of an unisolated index that will enforce the concurrency constraints of the BTree class, but only among other instances of this class for the same underlying index.
UnisolatedReadWriteIndex(BTree ndx, int defaultCapacity)
          Creates a view of an unisolated index that will enforce the concurrency constraints of the BTree class, but only among other instances of this class for the same underlying index.
 
Method Summary
 boolean contains(byte[] key)
          Return true iff there is a (non-deleted) index entry for the key.
 boolean contains(Object key)
          Return true iff there is an entry for the key.
 ScanCostReport estimateCost(DiskCostModel diskCostModel, long rangeCount)
          Estimate the cost of a range scan.
 ICounter getCounter()
          This throws an exception.
 CounterSet getCounters()
          Return performance counters.
 IndexMetadata getIndexMetadata()
          The metadata for the index.
static ReadWriteLock getReadWriteLock(ICommitter btree)
          Canonicalizing factory for the ReadWriteLock for an ICommitter.
 IResourceMetadata[] getResourceMetadata()
          The description of the resources comprising the index view.
 byte[] insert(byte[] key, byte[] value)
          Insert or update a value under the key.
 Object insert(Object key, Object value)
          Insert with auto-magic handling of keys and value objects.
 byte[] lookup(byte[] key)
          Lookup a value for a key.
 Object lookup(Object key)
          Lookup a value for a key.
 long rangeCount()
          Return the #of tuples in the index.
 long rangeCount(byte[] fromKey, byte[] toKey)
          Return the #of tuples in a half-open key range.
 long rangeCountExact(byte[] fromKey, byte[] toKey)
          Return the exact #of tuples in a half-open key range.
 long rangeCountExactWithDeleted(byte[] fromKey, byte[] toKey)
          Return the exact #of tuples in a half-open key range, including any deleted tuples.
 ITupleIterator rangeIterator()
          Visits all tuples in key order.
 ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey)
          Return an iterator that visits the entries in a half-open key range.
 ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey, int capacity, int flags, IFilter filter)
          The iterator will read on the underlying index in chunks, buffering tuples as it goes.
protected  Lock readLock()
          A shared read lock used (in the absence of other concurrency control mechanisms) to permit concurrent readers on an unisolated index while serializing access to that index when a writer must run.
 byte[] remove(byte[] key)
          Remove the key and its associated value.
 Object remove(Object key)
          Remove the key and its associated value.
 void submit(byte[] fromKey, byte[] toKey, IKeyRangeIndexProcedure proc, IResultHandler handler)
          The procedure will be transparently applied against each index partition spanned by the given key range.
 Object submit(byte[] key, ISimpleIndexProcedure proc)
          Submits an index procedure that operations on a single key to the appropriate index partition returning the result of that procedure.
 void submit(int fromIndex, int toIndex, byte[][] keys, byte[][] vals, AbstractKeyArrayIndexProcedureConstructor ctor, IResultHandler aggregator)
          Runs a procedure against an index.
 String toString()
           
 Lock writeLock()
          An exclusive write lock used (in the absence of other concurrency control mechanisms) to serialize all processes accessing an unisolated index when a writer must run.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LOCK_TIMEOUT_MILLIS

protected static final long LOCK_TIMEOUT_MILLIS
The #of milliseconds that the class will wait for a read or write lock. A (wrapped) InterruptedException will be thrown if this timeout is exceeded. The default is 9223372036854775807L milliseconds. Use Long.MAX_VALUE for no timeout.

See Also:
Constant Field Values
TODO:
There may be no reason to have a timeout when waiting for a lock in which case we can get rid of this field. Also, there is no means available to configure the timeout (in a similar fashion you can not configure the fairness policy for the ReentrantReadWriteLock).
Constructor Detail

UnisolatedReadWriteIndex

public UnisolatedReadWriteIndex(BTree ndx)
Creates a view of an unisolated index that will enforce the concurrency constraints of the BTree class, but only among other instances of this class for the same underlying index.

Parameters:
ndx - The underlying unisolated index.
Throws:
IllegalArgumentException - if the index is null.

UnisolatedReadWriteIndex

public UnisolatedReadWriteIndex(BTree ndx,
                                int defaultCapacity)
Creates a view of an unisolated index that will enforce the concurrency constraints of the BTree class, but only among other instances of this class for the same underlying index.

Parameters:
ndx - The underlying unisolated index.
defaultCapacity - The capacity for iterator reads against the underlying index. The main purpose of the capacity is to reduce the contention for the ReadWriteLock. Relatively small values should therefore be fine. See DEFAULT_CAPACITY.
Throws:
IllegalArgumentException - if the index is null.
TODO:
This is using an internal canonicalizing map for the ReadWriteLocks. It would be as easy to provide a canonicalizing factory for these objects on the Journal and TemporaryStore., fairness is NOT required for the locks. I believe that this is supposed to provide better throughput, but that has not been tested. Also, this has not been tested with a simple mutex lock vs a read-write lock. The use case for which this class was originally developed was computing the fix point of a set of rules. In that use case, we do a lot of concurrent reading and periodically flush the computed solutions onto the relations. It is likely that a read-write lock will do well for this situation.
Method Detail

writeLock

public Lock writeLock()
An exclusive write lock used (in the absence of other concurrency control mechanisms) to serialize all processes accessing an unisolated index when a writer must run. This is automatically obtained by methods on this class which will write on the underlying IIndex. It is exposed for processes which need to obtain the write lock to coordinate external operations.

Returns:
The acquired lock.

readLock

protected Lock readLock()
A shared read lock used (in the absence of other concurrency control mechanisms) to permit concurrent readers on an unisolated index while serializing access to that index when a writer must run. This is automatically obtained by methods on this class which will write on the underlying IIndex. It is exposed for processes which need to obtain the write lock to coordinate external operations.

Returns:
The acquired lock.

getReadWriteLock

public static ReadWriteLock getReadWriteLock(ICommitter btree)
Canonicalizing factory for the ReadWriteLock for an ICommitter.

Parameters:
btree - The btree.
Returns:
The lock.
Throws:
IllegalArgumentException - if the argument is null.

toString

public String toString()
Overrides:
toString in class Object

getIndexMetadata

public IndexMetadata getIndexMetadata()
Description copied from interface: IIndex
The metadata for the index. This is full of good stuff about the index.

Specified by:
getIndexMetadata in interface IIndex

getResourceMetadata

public IResourceMetadata[] getResourceMetadata()
Description copied from interface: IIndex
The description of the resources comprising the index view.

Specified by:
getResourceMetadata in interface IIndex

getCounters

public CounterSet getCounters()
Description copied from interface: IIndex
Return performance counters.

Interesting performance counters and other statistics about the index.

Specified by:
getCounters in interface IIndex
Specified by:
getCounters in interface ICounterSetAccess

getCounter

public ICounter getCounter()
This throws an exception. If you need access to the ICounter for the index partition, then write and submit an IIndexProcedure.

Specified by:
getCounter in interface IIndexLocalCounter
Throws:
UnsupportedOperationException

contains

public boolean contains(Object key)
Description copied from interface: IAutoboxBTree
Return true iff there is an entry for the key.

Specified by:
contains in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
Returns:
True if the btree contains an entry for that key.

insert

public Object insert(Object key,
                     Object value)
Description copied from interface: IAutoboxBTree
Insert with auto-magic handling of keys and value objects.

Specified by:
insert in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
value - The value is implicitly converted to a byte[].
Returns:
The de-serialized old value -or- null if there was no value stored under that key.

lookup

public Object lookup(Object key)
Description copied from interface: IAutoboxBTree
Lookup a value for a key.

Specified by:
lookup in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
Returns:
The de-serialized value or null if there is no entry for that key.

remove

public Object remove(Object key)
Description copied from interface: IAutoboxBTree
Remove the key and its associated value.

Specified by:
remove in interface IAutoboxBTree
Parameters:
key - The key is implicitly converted to an unsigned byte[].
Returns:
The de-serialized value stored under that key or null if the key was not found.

contains

public boolean contains(byte[] key)
Description copied from interface: ISimpleBTree
Return true iff there is a (non-deleted) index entry for the key. An index entry with a null value will cause this method to return true. A deleted index entry will cause this method to return false.

Specified by:
contains in interface ISimpleBTree
Parameters:
key - The key.
Returns:
true if the index contains an (un-deleted) entry for that key.

lookup

public byte[] lookup(byte[] key)
Description copied from interface: ISimpleBTree
Lookup a value for a key.

Specified by:
lookup in interface ISimpleBTree
Returns:
The value stored under that key or null if there is no entry for that key or if the entry under that key is marked as deleted.

insert

public byte[] insert(byte[] key,
                     byte[] value)
Description copied from interface: ISimpleBTree
Insert or update a value under the key.

Specified by:
insert in interface ISimpleBTree
Parameters:
key - The key.
value - The value (may be null).
Returns:
The previous value under that key or null if the key was not found or if the previous entry for that key was marked as deleted.

remove

public byte[] remove(byte[] key)
Description copied from interface: ISimpleBTree
Remove the key and its associated value.

Specified by:
remove in interface ISimpleBTree
Parameters:
key - The key.
Returns:
The value stored under that key or null if the key was not found or if the previous entry under that key was marked as deleted.

rangeCount

public long rangeCount()
Description copied from interface: IRangeQuery
Return the #of tuples in the index.

Note: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.

Specified by:
rangeCount in interface IRangeQuery
Returns:
The #of tuples in the index.

rangeCount

public long rangeCount(byte[] fromKey,
                       byte[] toKey)
Description copied from interface: IRangeQuery
Return the #of tuples in a half-open key range. The fromKey and toKey need not exist in the B+Tree.

Note: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.

Specified by:
rangeCount in interface IRangeQuery
Parameters:
fromKey - The lowest key that will be counted (inclusive). When null there is no lower bound.
toKey - The first key that will not be counted (exclusive). When null there is no upper bound.
Returns:
The #of tuples in the half-open key range.

rangeCountExact

public long rangeCountExact(byte[] fromKey,
                            byte[] toKey)
Description copied from interface: IRangeQuery
Return the exact #of tuples in a half-open key range. The fromKey and toKey need not exist in the B+Tree.

Note: If the index supports deletion markers then this operation will require a key-range scan.

Specified by:
rangeCountExact in interface IRangeQuery
Parameters:
fromKey - The lowest key that will be counted (inclusive). When null there is no lower bound.
toKey - The first key that will not be counted (exclusive). When null there is no upper bound.
Returns:
The exact #of tuples in the half-open key range.

rangeCountExactWithDeleted

public long rangeCountExactWithDeleted(byte[] fromKey,
                                       byte[] toKey)
Description copied from interface: IRangeQuery
Return the exact #of tuples in a half-open key range, including any deleted tuples. The fromKey and toKey need not exist in the B+Tree.

When the view is just an AbstractBTree the result is the same as for IRangeQuery.rangeCount(byte[], byte[]), which already reports all tuples regardless of whether or not they are deleted.

When the index is a view with multiple sources, this operation requires a key-range scan where both deleted and undeleted tuples are visited.

Specified by:
rangeCountExactWithDeleted in interface IRangeQuery
Parameters:
fromKey - The lowest key that will be counted (inclusive). When null there is no lower bound.
toKey - The first key that will not be counted (exclusive). When null there is no upper bound.
Returns:
The exact #of deleted and undeleted tuples in the half-open key range.
See Also:
IRangeQuery.rangeCountExact(byte[], byte[])

rangeIterator

public final ITupleIterator rangeIterator()
Description copied from interface: IRangeQuery
Visits all tuples in key order. This is identical to
 rangeIterator(null, null)
 

Specified by:
rangeIterator in interface IRangeQuery
Returns:
An iterator that will visit all entries in key order.

rangeIterator

public ITupleIterator rangeIterator(byte[] fromKey,
                                    byte[] toKey)
Description copied from interface: IRangeQuery
Return an iterator that visits the entries in a half-open key range. When toKey EQ fromKey nothing will be visited. It is an error if toKey LT fromKey.

Specified by:
rangeIterator in interface IRangeQuery
Parameters:
fromKey - The first key that will be visited (inclusive lower bound). When null there is no lower bound.
toKey - The first key that will NOT be visited (exclusive upper bound). When null there is no upper bound.
See Also:
SuccessorUtil, which may be used to compute the successor of a value before encoding it as a component of a key., BytesUtil#successor(byte[]), which may be used to compute the successor of an encoded key., EntryFilter, which may be used to filter the entries visited by the iterator.

rangeIterator

public ITupleIterator rangeIterator(byte[] fromKey,
                                    byte[] toKey,
                                    int capacity,
                                    int flags,
                                    IFilter filter)
The iterator will read on the underlying index in chunks, buffering tuples as it goes. The buffer capacity is as specified by the caller and will default to the capacity specified to the ctor. The iterator acquires and releases the appropriate lock (either the shared read lock or the exclusive write lock) before it fetches reads the next chunk of tuples from the underlying index. Likewise, the mutation methods on the iterator will acquire the exclusive write lock.

Specified by:
rangeIterator in interface IRangeQuery
Parameters:
fromKey - The first key that will be visited (inclusive lower bound). When null there is no lower bound.
toKey - The first key that will NOT be visited (exclusive upper bound). When null there is no upper bound.
capacity - The #of entries to buffer at a time. This is a hint and MAY be zero (0) to use an implementation specific default capacity. A non-zero value may be used if you know that you want at most N results or if you want to override the default #of results to be buffered before sending them across a network interface. (Note that you can control the default value using IBigdataClient.Options#DEFAULT_CLIENT_RANGE_QUERY_CAPACITY).
flags - A bitwise OR of IRangeQuery.KEYS, IRangeQuery.VALS, etc.
filter - An optional object used to construct a stacked iterator. When IRangeQuery.CURSOR is specified in flags, the base iterator will implement ITupleCursor and the first filter in the stack can safely cast the source iterator to an ITupleCursor. If the outermost filter in the stack does not implement ITupleIterator, then it will be wrapped an ITupleIterator.
See Also:
SuccessorUtil, which may be used to compute the successor of a value before encoding it as a component of a key., BytesUtil#successor(byte[]), which may be used to compute the successor of an encoded key., IFilterConstructor, which may be used to construct an iterator stack performing filtering or other operations.

submit

public Object submit(byte[] key,
                     ISimpleIndexProcedure proc)
Description copied from interface: IIndex
Submits an index procedure that operations on a single key to the appropriate index partition returning the result of that procedure.

Specified by:
submit in interface IIndex
Parameters:
key - The key.
proc - The procedure.
Returns:
The value returned by IIndexProcedure.apply(IIndex)

submit

public void submit(byte[] fromKey,
                   byte[] toKey,
                   IKeyRangeIndexProcedure proc,
                   IResultHandler handler)
Description copied from interface: IIndex
The procedure will be transparently applied against each index partition spanned by the given key range.

Note: Since this variant of submit() does not split keys the fromIndex and toIndex in the Splits reported to the IResultHandler will be zero (0).

Specified by:
submit in interface IIndex
Parameters:
fromKey - The lower bound (inclusive) -or- null if there is no lower bound.
toKey - The upper bound (exclusive) -or- null if there is no upper bound.
proc - The procedure. If the procedure implements the IParallelizableIndexProcedure marker interface then it MAY be executed in parallel against the relevant index partition(s).

submit

public void submit(int fromIndex,
                   int toIndex,
                   byte[][] keys,
                   byte[][] vals,
                   AbstractKeyArrayIndexProcedureConstructor ctor,
                   IResultHandler aggregator)
Description copied from interface: IIndex
Runs a procedure against an index.

Note: This may be used to send custom logic together with the data to a remote index or index partition. When the index is remote both the procedure and the return value MUST be Serializable.

Note: The scale-out indices add support for auto-split of the procedure such that it runs locally against each relevant index partition.

Specified by:
submit in interface IIndex
Parameters:
fromIndex - The index of the first key to be used (inclusive).
toIndex - The index of the last key to be used (exclusive).
keys - The keys (required).
vals - The values (optional depending on the procedure).
ctor - An object that can create instances of the procedure.
aggregator - When defined, results from each procedure application will be reported to this object.

estimateCost

public ScanCostReport estimateCost(DiskCostModel diskCostModel,
                                   long rangeCount)
Estimate the cost of a range scan.

Parameters:
diskCostModel - The disk cost model.
rangeCount - The #of tuples to be visited.
Returns:
The estimated cost.


Copyright © 2006-2011 SYSTAP, LLC. All Rights Reserved.