Package com.bigdata.btree.isolation


See: Description

Package com.bigdata.btree.isolation Description

Support for transactional isolation builds on the basic features of the com.bigdata.btree package, including copy-on-write semantics, per-tuple delete markers, and per-tuple revision timestamps, and on the characteristics of the AbstractJournal, which guarantees that valid data are never overwritten (it is an immortal database aka temporal store aka WORM store). In order to support transactional isolation, the B+Tree MUST be provisioned to maintain both per-tuple delete markers and per-tuple revision timestamps. The per-tuple delete marker is used to mark keys for tuples which have been deleted. Delete markers remain in place unless the tuple is updated to a non-deleted tuple. (In the scale-out architecture, delete markers are purged by a compacting merge.) The per-tuple revision timestamp is used to detect write-write conflicts between transactions as described below. This is an MVCC (Multiple Version Concurrency Control) design.

The basic design for isolation requires that reads are performed against a historical committed state of the store (the ground state, which is typically the last committed state of the store at the time that the transaction begins) while writes are isolated (they are not visible outside of the transaction). The basic mechanism for isolation is an com.bigdata.btree.isolated.IsolatedFusedView which reads through to a read-only btree loaded from a historical metadata record while writes go into the isolated BTree (aka the transaction write set).

In order to commit, the transaction must validate the write set on the isolated btree against the then current committed state of the unisolated btree. If there have been no intervening commits then validation is a NOP since the read-only btree that the isolated btree reads through to is the current committed state. If there have been intervening commits, then validation may identify write-write conflicts (read-write conflicts can not occur in MVCC). A write-write conflict exists when a concurrent transaction wrote data for the same key as the transaction that is being validated and has already committed (conflicts are not visible until a writer has committed).

Once a transaction has validated it is merged down onto the globally visible state of the btree. This process consists simply of applying the changes to the globally visible btree, including both inserts of key-value pairs and removal of keys that were deleted during the transaction. A revision timestamp is assigned when the transaction begins to validate its write set. The revision timestamp is used to annotate each tuple updated by the transaction. This is the basis for detecting write-write conflicts.

If a transaction is reading from or writing on more than one btree, then it must validate the write set for each btree during its validation stage and merge down the write set for each btree during its merge stage. Once this merge process is complete, the btree is flushed to the backing store which results in a new checkpoint record. The mapping from the btree name to checkpoint record is then updated on the backing store. Finally, an atomic commit is then performed on the backing store. At this point the transaction has successfully completed.

Conflict Resolution

Write-write conflicts may be reconciled using application specific logic. To do this you must register an IConflictResolver when the B+-Tree is provisioned. This interface will offer you an opportunity to inspect each tuple for which a write-write conflict was detected. If the conflicts are reconciled, then the transaction will commit. Otherwise it will abort.

Extended commit processing

Support for this has not been specified yet.

Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.