An interface invoked during backward validation when a write-write conflict has been detected.
An index (or index partition) that has been isolated by a transaction.
Does not resolve any conflicts.
Support for transactional isolation builds on the basic features of the
com.bigdata.btree package, including copy-on-write semantics, per-tuple
delete markers, and per-tuple revision timestamps, and on the characteristics
AbstractJournal, which guarantees that valid
data are never overwritten (it is an immortal database aka temporal store aka
WORM store). In order to support transactional isolation, the B+Tree MUST be
provisioned to maintain both per-tuple delete markers and per-tuple revision
timestamps. The per-tuple delete marker is used to mark keys for tuples which
have been deleted. Delete markers remain in place unless the tuple is updated
to a non-deleted tuple. (In the scale-out architecture, delete markers are purged
by a compacting merge.) The per-tuple revision timestamp is used to detect
write-write conflicts between transactions as described below. This is an
MVCC (Multiple Version Concurrency Control) design.
The basic design for isolation requires that reads are performed
against a historical committed state of the store (the ground state,
which is typically the last committed state of the store at the time
that the transaction begins) while writes are isolated (they are not
visible outside of the transaction). The basic mechanism for isolation
com.bigdata.btree.isolated.IsolatedFusedView which reads through
to a read-only btree loaded from a historical metadata record while writes
go into the isolated
BTree (aka the transaction
In order to commit, the transaction must validate the write set on the isolated btree against the then current committed state of the unisolated btree. If there have been no intervening commits then validation is a NOP since the read-only btree that the isolated btree reads through to is the current committed state. If there have been intervening commits, then validation may identify write-write conflicts (read-write conflicts can not occur in MVCC). A write-write conflict exists when a concurrent transaction wrote data for the same key as the transaction that is being validated and has already committed (conflicts are not visible until a writer has committed).
Once a transaction has validated it is merged down onto the globally visible state of the btree. This process consists simply of applying the changes to the globally visible btree, including both inserts of key-value pairs and removal of keys that were deleted during the transaction. A revision timestamp is assigned when the transaction begins to validate its write set. The revision timestamp is used to annotate each tuple updated by the transaction. This is the basis for detecting write-write conflicts.
If a transaction is reading from or writing on more than one btree, then it must validate the write set for each btree during its validation stage and merge down the write set for each btree during its merge stage. Once this merge process is complete, the btree is flushed to the backing store which results in a new checkpoint record. The mapping from the btree name to checkpoint record is then updated on the backing store. Finally, an atomic commit is then performed on the backing store. At this point the transaction has successfully completed.
Write-write conflicts may be reconciled using application specific
logic. To do this you must register an
when the B+-Tree is provisioned. This interface will offer you an
opportunity to inspect each tuple for which a write-write conflict
was detected. If the conflicts are reconciled, then the transaction
will commit. Otherwise it will abort.
Support for this has not been specified yet.
Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.