public class UnisolatedReadWriteIndex extends Object implements IIndex
A view onto an unisolated index partition which enforces the constraint that
either concurrent readers -or- a single writer may have access to the
unisolated index at any given time. This provides the maximum possible
concurrency for an unisolated index using an internal ReadWriteLock
to coordinate threads.
The possible concurrency with this approach is higher than that provided by
the IConcurrencyManager since the latter only allows a single process
access to the unisolated index while this class can allow multiple readers
concurrent access to the same unisolated index. The use of this class
is NOT compatible with the IConcurrencyManager (the
IConcurrencyManager does not respect the locks managed by this
class).
This class does NOT handle deadlock detection. However, it does not expose
the underlying lock and the scope of the acquired lock should always be
restricted to a single operation as defined by IIndex. If you
circumvent this by writing and submitting an IIndexProcedure that
attempts an operation on another UnisolatedReadWriteIndex then a
deadlock MAY occur.
The point test methods on this class (get, contains, lookup, remove) have
relatively high overhead since they need to acquire and release the lock per
point test. If you need to do a bunch of point tests, then submit an
IIndexProcedure that will run against the underlying index once it
has acquired the appropriate lock -- point tests from within the
IIndexProcedure will be very efficient.
ConcurrencyManager and the
group commit protocol which it imposes on writers. It also facilitates the
reuse of the buffers backing the unisolated index, which can reduce IO
associated with index operations when compared to reading on a read-committed
view of the index with concurrent writes and interleaved commits on the
corresponding unisolated index.
Reading on the read-committed index view has greater possible concurrency, but requires that writes are committed before they become visible and must read the data from the disk since it does not have access to the buffers for the unisolated index. Requiring a commit in order for the writes to become visible imposes significant latency, especially when computing the fix point of a rule set which may take multiple rounds. Reading on the unisolated index should do better in terms of buffer reuse and does NOT require commits or checkpoints of the index for writes to become visible to readers but does require a means to correctly interleave access to the unisolated index, which is the purpose of this class.
While the lock manager could be modified to support Share vs Exclusive locks and to use Share locks for readers and Exclusive locks for writers, writers would still block until the next commit so the throughput (e.g., when computing the fix point of a rule set) is significantly lower.
| Modifier and Type | Field and Description |
|---|---|
protected static long |
LOCK_TIMEOUT_MILLIS
The #of milliseconds that the class will wait for a read or write lock.
|
| Constructor and Description |
|---|
UnisolatedReadWriteIndex(BTree ndx)
Creates a view of an unisolated index that will enforce the concurrency
constraints of the
BTree class, but only among other instances of
this class for the same underlying index. |
UnisolatedReadWriteIndex(BTree ndx,
int defaultCapacity)
Creates a view of an unisolated index that will enforce the concurrency
constraints of the
BTree class, but only among other instances of
this class for the same underlying index. |
| Modifier and Type | Method and Description |
|---|---|
boolean |
contains(byte[] key)
Return
true iff there is a (non-deleted) index entry for
the key. |
boolean |
contains(Object key)
Return true iff there is an entry for the key.
|
ScanCostReport |
estimateCost(DiskCostModel diskCostModel,
long rangeCount)
Estimate the cost of a range scan.
|
ICounter |
getCounter()
This throws an exception.
|
CounterSet |
getCounters()
Return performance counters.
|
IndexMetadata |
getIndexMetadata()
The metadata for the index.
|
static ReadWriteLock |
getReadWriteLock(ICommitter btree)
Canonicalizing factory for the
ReadWriteLock for an ICommitter. |
IResourceMetadata[] |
getResourceMetadata()
The description of the resources comprising the index view.
|
byte[] |
insert(byte[] key,
byte[] value)
Insert or update a value under the key.
|
Object |
insert(Object key,
Object value)
Insert with auto-magic handling of keys and value objects.
|
byte[] |
lookup(byte[] key)
Lookup a value for a key.
|
Object |
lookup(Object key)
Lookup a value for a key.
|
long |
rangeCount()
Return the #of tuples in the index.
|
long |
rangeCount(byte[] fromKey,
byte[] toKey)
Return the #of tuples in a half-open key range.
|
long |
rangeCountExact(byte[] fromKey,
byte[] toKey)
Return the exact #of tuples in a half-open key range.
|
long |
rangeCountExactWithDeleted(byte[] fromKey,
byte[] toKey)
Return the exact #of tuples in a half-open key range, including any
deleted tuples.
|
ITupleIterator |
rangeIterator()
Visits all tuples in key order.
|
ITupleIterator |
rangeIterator(byte[] fromKey,
byte[] toKey)
Return an iterator that visits the entries in a half-open key range.
|
ITupleIterator |
rangeIterator(byte[] fromKey,
byte[] toKey,
int capacity,
int flags,
IFilter filter)
The iterator will read on the underlying index in chunks, buffering
tuples as it goes.
|
protected Lock |
readLock()
A shared read lock used (in the absence of other concurrency control
mechanisms) to permit concurrent readers on an unisolated index while
serializing access to that index when a writer must run.
|
byte[] |
remove(byte[] key)
Remove the key and its associated value.
|
Object |
remove(Object key)
Remove the key and its associated value.
|
void |
submit(byte[] fromKey,
byte[] toKey,
IKeyRangeIndexProcedure proc,
IResultHandler handler)
The procedure will be transparently applied against each index partition
spanned by the given key range.
|
Object |
submit(byte[] key,
ISimpleIndexProcedure proc)
Submits an index procedure that operations on a single key to the
appropriate index partition returning the result of that procedure.
|
void |
submit(int fromIndex,
int toIndex,
byte[][] keys,
byte[][] vals,
AbstractKeyArrayIndexProcedureConstructor ctor,
IResultHandler aggregator)
Runs a procedure against an index.
|
String |
toString() |
Lock |
writeLock()
An exclusive write lock used (in the absence of other concurrency control
mechanisms) to serialize all processes accessing an unisolated index when
a writer must run.
|
protected static final long LOCK_TIMEOUT_MILLIS
InterruptedException will be thrown if this timeout is
exceeded. The default is 9223372036854775807L milliseconds. Use
Long.MAX_VALUE for no timeout.ReentrantReadWriteLock).public UnisolatedReadWriteIndex(BTree ndx)
BTree class, but only among other instances of
this class for the same underlying index.ndx - The underlying unisolated index.IllegalArgumentException - if the index is null.public UnisolatedReadWriteIndex(BTree ndx, int defaultCapacity)
BTree class, but only among other instances of
this class for the same underlying index.ndx - The underlying unisolated index.defaultCapacity - The capacity for iterator reads against the underlying index.
The main purpose of the capacity is to reduce the contention
for the ReadWriteLock. Relatively small values should
therefore be fine. See DEFAULT_CAPACITY.IllegalArgumentException - if the index is null.ReadWriteLocks. It would be as easy to provide a
canonicalizing factory for these objects on the Journal and
TemporaryStore., fairness is NOT required for the locks. I believe that this is
supposed to provide better throughput, but that has not been
tested. Also, this has not been tested with a simple mutex lock vs
a read-write lock. The use case for which this class was originally
developed was computing the fix point of a set of rules. In that
use case, we do a lot of concurrent reading and periodically flush
the computed solutions onto the relations. It is likely that a
read-write lock will do well for this situation.public Lock writeLock()
IIndex. It is exposed
for processes which need to obtain the write lock to coordinate external
operations.protected Lock readLock()
IIndex. It is exposed for processes which need to
obtain the write lock to coordinate external operations.public static ReadWriteLock getReadWriteLock(ICommitter btree)
ReadWriteLock for an ICommitter.btree - The btree.IllegalArgumentException - if the argument is null.public IndexMetadata getIndexMetadata()
IIndex
Note: The same method is exposed by ICheckpointProtocol. It is
also exposed here in order to provide access to the IndexMetadata
to remote clients in the scale-out architecture.
getIndexMetadata in interface IIndexICheckpointProtocol.getIndexMetadata()public IResourceMetadata[] getResourceMetadata()
IIndexgetResourceMetadata in interface IIndexpublic CounterSet getCounters()
IIndexInteresting performance counters and other statistics about the index.
getCounters in interface IIndexgetCounters in interface ICounterSetAccesspublic ICounter getCounter()
ICounter for
the index partition, then write and submit an IIndexProcedure.getCounter in interface IIndexLocalCounterUnsupportedOperationExceptionpublic boolean contains(Object key)
IAutoboxBTreecontains in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].public Object insert(Object key, Object value)
IAutoboxBTreeinsert in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].value - The value is implicitly converted to a byte[].null if there was
no value stored under that key.public Object lookup(Object key)
IAutoboxBTreelookup in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].null if there is no
entry for that key.public Object remove(Object key)
IAutoboxBTreeremove in interface IAutoboxBTreekey - The key is implicitly converted to an unsigned
byte[].null if the key was not found.public boolean contains(byte[] key)
ISimpleBTreetrue iff there is a (non-deleted) index entry for
the key. An index entry with a null value will cause this
method to return true. A deleted index entry will cause
this method to return false.contains in interface ISimpleBTreekey - The key.true if the index contains an (un-deleted) entry
for that key.public byte[] lookup(byte[] key)
ISimpleBTreelookup in interface ISimpleBTreenull if there
is no entry for that key or if the entry under that key is marked
as deleted.public byte[] insert(byte[] key,
byte[] value)
ISimpleBTreeinsert in interface ISimpleBTreekey - The key.value - The value (may be null).null if the
key was not found or if the previous entry for that key was
marked as deleted.public byte[] remove(byte[] key)
ISimpleBTreeremove in interface ISimpleBTreekey - The key.null if the key
was not found or if the previous entry under that key was marked
as deleted.public long rangeCount()
IRangeQueryNote: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.
rangeCount in interface IRangeQueryISimpleIndexAccess.rangeCount()public long rangeCount(byte[] fromKey,
byte[] toKey)
IRangeQueryNote: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.
rangeCount in interface IRangeQueryfromKey - The lowest key that will be counted (inclusive). When
null there is no lower bound.toKey - The first key that will not be counted (exclusive). When
null there is no upper bound.public long rangeCountExact(byte[] fromKey,
byte[] toKey)
IRangeQueryNote: If the index supports deletion markers then this operation will require a key-range scan.
rangeCountExact in interface IRangeQueryfromKey - The lowest key that will be counted (inclusive). When
null there is no lower bound.toKey - The first key that will not be counted (exclusive). When
null there is no upper bound.public long rangeCountExactWithDeleted(byte[] fromKey,
byte[] toKey)
IRangeQuery
When the view is just an AbstractBTree the result is the same as
for IRangeQuery.rangeCount(byte[], byte[]), which already
reports all tuples regardless of whether or not they are deleted.
When the index is a view with multiple sources, this operation requires a key-range scan where both deleted and undeleted tuples are visited.
rangeCountExactWithDeleted in interface IRangeQueryfromKey - The lowest key that will be counted (inclusive). When
null there is no lower bound.toKey - The first key that will not be counted (exclusive). When
null there is no upper bound.IRangeQuery.rangeCountExact(byte[], byte[])public final ITupleIterator rangeIterator()
IRangeQueryrangeIterator(null, null)
rangeIterator in interface IRangeQuerypublic ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey)
IRangeQueryrangeIterator in interface IRangeQueryfromKey - The first key that will be visited (inclusive lower bound).
When null there is no lower bound.toKey - The first key that will NOT be visited (exclusive upper
bound). When null there is no upper bound.SuccessorUtil, which may be used to compute the successor of a value
before encoding it as a component of a key.,
BytesUtil#successor(byte[]), which may be used to compute the
successor of an encoded key.,
EntryFilter, which may be used to filter the entries visited by the
iterator.public ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey, int capacity, int flags, IFilter filter)
rangeIterator in interface IRangeQueryfromKey - The first key that will be visited (inclusive lower bound).
When null there is no lower bound.toKey - The first key that will NOT be visited (exclusive upper
bound). When null there is no upper bound.capacity - The #of entries to buffer at a time. This is a hint and MAY be
zero (0) to use an implementation specific default
capacity. A non-zero value may be used if you know that you
want at most N results or if you want to override the default
#of results to be buffered before sending them across a
network interface. (Note that you can control the default
value using
IBigdataClient.Options#DEFAULT_CLIENT_RANGE_QUERY_CAPACITY).flags - A bitwise OR of IRangeQuery.KEYS, IRangeQuery.VALS, etc.filter - An optional object used to construct a stacked iterator. When
IRangeQuery.CURSOR is specified in flags, the base
iterator will implement ITupleCursor and the first
filter in the stack can safely cast the source iterator to an
ITupleCursor. If the outermost filter in the stack
does not implement ITupleIterator, then it will be
wrapped an ITupleIterator.SuccessorUtil, which may be used to compute the successor of a value
before encoding it as a component of a key.,
BytesUtil#successor(byte[]), which may be used to compute the
successor of an encoded key.,
IFilterConstructor, which may be used to construct an iterator stack
performing filtering or other operations.public Object submit(byte[] key, ISimpleIndexProcedure proc)
IIndexsubmit in interface IIndexkey - The key.proc - The procedure.IIndexProcedure.apply(IIndex)public void submit(byte[] fromKey,
byte[] toKey,
IKeyRangeIndexProcedure proc,
IResultHandler handler)
IIndex
Note: Since this variant of submit() does not split keys the
fromIndex and toIndex in the Splits reported to
the IResultHandler will be zero (0).
submit in interface IIndexfromKey - The lower bound (inclusive) -or- null if there
is no lower bound.toKey - The upper bound (exclusive) -or- null if there
is no upper bound.proc - The procedure. If the procedure implements the
IParallelizableIndexProcedure marker interface then it
MAY be executed in parallel against the relevant index
partition(s).public void submit(int fromIndex,
int toIndex,
byte[][] keys,
byte[][] vals,
AbstractKeyArrayIndexProcedureConstructor ctor,
IResultHandler aggregator)
IIndex
Note: This may be used to send custom logic together with the data to a
remote index or index partition. When the index is remote both the
procedure and the return value MUST be Serializable.
Note: The scale-out indices add support for auto-split of the procedure such that it runs locally against each relevant index partition.
submit in interface IIndexfromIndex - The index of the first key to be used (inclusive).toIndex - The index of the last key to be used (exclusive).keys - The keys (required).vals - The values (optional depending on the procedure).ctor - An object that can create instances of the procedure.aggregator - When defined, results from each procedure application will be
reported to this object.public ScanCostReport estimateCost(DiskCostModel diskCostModel, long rangeCount)
diskCostModel - The disk cost model.rangeCount - The #of tuples to be visited.Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.