com.bigdata.btree.isolation
Class IsolatedFusedView

java.lang.Object
  extended by com.bigdata.btree.view.FusedView
      extended by com.bigdata.btree.isolation.IsolatedFusedView
All Implemented Interfaces:
IAutoboxBTree, IIndex, ILocalBTreeView, IRangeQuery, ISimpleBTree

public class IsolatedFusedView
extends FusedView

An index (or index partition) that has been isolated by a transaction. Isolation is achieved by the following mechanisms:

  1. The writeSet of the transaction on the index is isolated on a BTree visible only to that transaction.
  2. Version timestamps are maintained for index entries in both the isolated write set and groundState from which the transaction is reading.
  3. The groundState is defined as the view of the index (partition) as of the abs(startTime) of the transaction.
  4. Reads are performed against an ordered view defined by the writeSet followed by the ordered set of indices defining the groundState of the index.
  5. Writes first read through the ordered view to locate the most recent version for an index entry. If the index entry is located in the isolated writeSet then it is overwritten and its timestamp is unchanged. If the index entry is located in the groundState then the timestamp is copied from that index entry and written on the new version in the isolated writeSet.
  6. During validation, version timestamps in the isolated writeSet are compared against the then current view of the corresponding unisolated index. If the timestamp in the unisolated view differs from that in the writeSet then there is a write-write conflict. Write-write conflicts MAY be validated if the index has a registered IConflictResolver.
  7. If the writeSet is validated then it is mergedDown (copied onto) the then current unisolated index view. During the mergeDown phase the revision timestamp of the transaction is applied to all index entries copied from the write set. Transactions that later try to commit will recognize write-write conflicts based on those updated timestamps. Note that revision timestamps ARE NOT commit timestamps. Revision timestamps are assigned at the start of the validation phase. All tuples modified by a transaction are annotated with the same revision timestamp during the validation phase of the transaction. Write-write conflicts are detected on the basis of the per-tuple revision timestamps. Commit timestamps are assigned once the write set has been validated and checkpointed and all shards participating in the commit protocol signal that they are prepared and ready to commit.

Note: The timestamp from which the post-commit state of the transaction may be read IS NOT defined for an IBigdataFederation. It is not possible to define this timestamp without requiring concurrent commit processing to be paused on all data services on which the transaction has written, which is viewed as too high a cost. Instead, the commit timestamp is the state from which you can read the data written by the transaction. Reads on tuples NOT updated by the transaction MAY have been changed by concurrent transactions.

Note: The process of validating, merging down changes, and committing those changes MUST be atomic. Therefore no other operations may be permitted access to the unisolated indices corresponding to the isolated indices on which the transaction during this process. This constraint is generally achieved by holding a write lock on the unisolated indices corresponding to the indices isolated by the transaction, e.g., by declaring those indices to an ITx.UNISOLATED AbstractTask which handles this process.

Version:
$Id: IsolatedFusedView.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson

Nested Class Summary
 
Nested classes/interfaces inherited from class com.bigdata.btree.view.FusedView
FusedView.FusedBloomFilter
 
Field Summary
 
Fields inherited from class com.bigdata.btree.view.FusedView
ERR_RANGE_COUNT_EXCEEDS_MAX_LONG, log
 
Fields inherited from interface com.bigdata.btree.IRangeQuery
ALL, CURSOR, DEFAULT, DELETED, FIXED_LENGTH_SUCCESSOR, KEYS, NONE, PARALLEL, READONLY, REMOVEALL, REVERSE, VALS
 
Constructor Summary
IsolatedFusedView(long timestamp, AbstractBTree[] sources)
          Constructor may be used either for a fully isolated transaction or an unisolated operation.
 
Method Summary
 ICounter getCounter()
          Counters are disallowed for isolated view.
 BTree getWriteSet()
          The isolated write set (the place where we record the intention of the transaction).
 byte[] insert(byte[] key, byte[] val)
          Write an entry for the key on the write set.
 boolean isEmptyWriteSet()
          True iff there are no writes on this isolated index.
 void mergeDown(long revisionTime, AbstractBTree[] groundStateSources)
           Merge the transaction scope index onto the then current unisolated index.
 byte[] remove(byte[] key)
          Write a deleted entry for the key on the write set.
 boolean validate(AbstractBTree[] groundStateSources)
           Validate changes made to the index within a transaction against the last committed state of the index in the global scope.
 
Methods inherited from class com.bigdata.btree.view.FusedView
assertNotReadOnly, contains, contains, getBloomFilter, getCounters, getIndexMetadata, getMutableBTree, getResourceMetadata, getSourceCount, getSources, insert, lookup, lookup, lookup, lookup, rangeCount, rangeCount, rangeCountExact, rangeCountExactWithDeleted, rangeIterator, rangeIterator, rangeIterator, remove, submit, submit, submit, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

IsolatedFusedView

public IsolatedFusedView(long timestamp,
                         AbstractBTree[] sources)
Constructor may be used either for a fully isolated transaction or an unisolated operation. In each case the groundState is the ordered set of read-only resources corresponding to the timestamp.

Reads will read through the writeSet and then the resource(s) in the groundState in the order in which they are given. A read is satisified by the first resource containing an index entry for the search key.

Writes will first read through looking for a @todo javadoc

Parameters:
timestamp - The timestamp associated with the groundState.
sources - An ordered array of sources comprised of the BTree that will absorb writes and the historical ground state.
Method Detail

getWriteSet

public BTree getWriteSet()
The isolated write set (the place where we record the intention of the transaction). This is just a reference to the mutable BTree at index zero(0) of sources in the view.

See Also:
FusedView.getMutableBTree()

isEmptyWriteSet

public boolean isEmptyWriteSet()
True iff there are no writes on this isolated index.


getCounter

public final ICounter getCounter()
Counters are disallowed for isolated view. The reason is that counters are typically used to create one-up distinct values assigned to keys. If the counter is stored on the write set then different transactions could easily assign the same counter value under different keys, leading to an undetectable write conflict.

Specified by:
getCounter in interface IIndex
Overrides:
getCounter in class FusedView
Throws:
UnsupportedOperationException - always
TODO:
counters could probably be enabled within transactions if we used the counter from the then current mutable btree. This would have to be passed into the constructor. In addition, the counter logic would have to be carefully checked to make sure that counter assignments remain consistent. The counter itself is an AtomicInteger. However additional care needs to be taken to ensure that the counter value is persisted if it is changed (by updating the BTree Checkpoint record). The cases where the tx bumps the counter need to be carefully examined since it could force the write of the unisolated btree when we actually do not want to commit the btree - some kind of locking may be required. So, for now, this is disabled.

insert

public byte[] insert(byte[] key,
                     byte[] val)
Write an entry for the key on the write set.

Specified by:
insert in interface ISimpleBTree
Overrides:
insert in class FusedView
Parameters:
key - The key.
val - The value (may be null).
Returns:
The previous value under that key or null if the key was not found or if the previous entry for that key was marked as deleted.

remove

public byte[] remove(byte[] key)
Write a deleted entry for the key on the write set.

Specified by:
remove in interface ISimpleBTree
Overrides:
remove in class FusedView
Parameters:
key - The key.
Returns:
The value stored under that key or null if the key was not found or if the previous entry under that key was marked as deleted.

validate

public boolean validate(AbstractBTree[] groundStateSources)

Validate changes made to the index within a transaction against the last committed state of the index in the global scope. In general there are two kinds of conflicts: read-write conflicts and write-write conflicts. Read-write conflicts are handled by NEVER overwriting an existing version (an MVCC style strategy). Write-write conflicts are detected by backward validation against the last committed state of the journal. A write-write conflict exists IFF the version counter on the transaction index entry differs from the version counter in the global index scope. Once detected, the resolution of a write-write conflict is delegated to a conflict resolver. If a write-write conflict can not be validated, then validation will fail and the transaction must abort.

Validation occurs as part of the prepare/commit protocol. Concurrent transactions MAY continue to run without limitation. A concurrent commit (if permitted) would force re-validation since the transaction MUST now be validated against the new baseline. (It is possible that this validation could be optimized.)

The version counters used to detect write-write conflicts are incremented during the commit as part of the #mergeDown() of the IsolatedFusedView onto the corresponding unisolated indices in the global scope.

Parameters:
groundStateSources - The ordered view of the unisolated index. This MUST be the current view of the ground state as of when the transaction is validated (NOT when it was created). This view WILL NOT the same as the groundState specified to the constructor if intervening transactions have committed on the index.
Returns:
True iff validation succeeds.

mergeDown

public void mergeDown(long revisionTime,
                      AbstractBTree[] groundStateSources)

Merge the transaction scope index onto the then current unisolated index.

Note: This method is invoked by a transaction during commit processing to merge the write set of an IsolatedFusedView into the global scope. This operation does NOT check for conflicts. The pre-condition is that the transaction has already been validated (hence, there will be no conflicts).

Note: This method is also responsible for updating the version timestamps that are used to detect write-write conflicts during validation - they are set to the revisionTime.

Parameters:
revisionTime - The revision timestamp assigned to the commit point of the transaction.
groundStateSources - The ordered view of the unisolated index. This MUST be the current view of the ground state as of when the transaction is validated (NOT when it was created). This view WILL NOT the same as the groundState specified to the constructor if intervening transactions have committed on the index.


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.