com.bigdata.service
Class DistributedTransactionService

java.lang.Object
  extended by com.bigdata.service.AbstractService
      extended by com.bigdata.service.AbstractTransactionService
          extended by com.bigdata.service.DistributedTransactionService
All Implemented Interfaces:
ITimestampService, ITransactionService, IService, IServiceShutdown, Remote
Direct Known Subclasses:
AbstractEmbeddedTransactionService, TransactionServer.AdministrableTransactionService

public abstract class DistributedTransactionService
extends AbstractTransactionService

Implementation for an IBigdataFederation supporting both single-phase commits (for transactions that execute on a single IDataService) and distributed commits.

Version:
$Id: DistributedTransactionService.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson

Nested Class Summary
protected  class DistributedTransactionService.NotifyReleaseTimeTask
          Task periodically notifies the discovered IDataServices of the new release time.
static interface DistributedTransactionService.Options
          Options understood by this service.
static class DistributedTransactionService.SnapshotHelper
          A helper class for reading and writing snapshots of the commit time index.
 
Nested classes/interfaces inherited from class com.bigdata.service.AbstractTransactionService
AbstractTransactionService.TxState
 
Field Summary
protected static String BASENAME
          Basename for the files written in the dataDir containing images of the commitTimeIndex.
protected  CommitTimeIndex commitTimeIndex
          A BTree containing a log of the historical commit points.
protected  File dataDir
          The data directory -or- null iff the service is transient.
protected static String EXT
          Extension for the files written in the dataDir containing snapshots of the commitTimeIndex.
 
Fields inherited from class com.bigdata.service.AbstractTransactionService
countersRoot, DEBUG, ERR_NO_SUCH, ERR_NOT_ACTIVE, ERR_READ_ONLY, ERR_SERVICE_NOT_AVAIL, INFO, lock, log, startTimeIndex, txDeactivate
 
Constructor Summary
DistributedTransactionService(Properties properties)
           
 
Method Summary
protected  void abortImpl(AbstractTransactionService.TxState state)
          Implementation must abort the tx on the journal (standalone) or on each data service (federation) on which it has written.
protected  void addScheduledTasks()
          Adds the scheduled tasks.
protected  long commitImpl(AbstractTransactionService.TxState state)
          There are two distinct commit protocols depending on whether the transaction write set is distributed across more than one IDataService.
 boolean committed(long tx, UUID dataService)
          Wait at "committed" barrier.
 void destroy()
          Immediate/fast shutdown of the service and then destroys any persistent state associated with the service.
protected  long findCommitTime(long timestamp)
          Find the commit time from which the tx will read (largest commitTime LTE timestamp).
protected  long findNextCommitTime(long commitTime)
          Return the commit time for the successor of that commit point have the specified timestamp (a commit time strictly GT the given value).
 CounterSet getCounters()
          Adds counters for the LockManager.
protected  ITxCommitProtocol[] getDataServices(UUID[] uuids)
          Return the proxies for the services participating in a distributed transaction commit or abort.
 long getLastCommitTime()
          Note: Declared abstract so that we can hide the IOException.
 void notifyCommit(long commitTime)
          The basic implementation advances the release time periodically as commits occur even when there are no transactions in use.
 long prepared(long tx, UUID dataService)
          Waits at "prepared" barrier.
protected  void setReleaseTime(long releaseTime)
          Extended to truncate the head of the commitTimeIndex such only the commit times requires for reading on timestamps GTE to the new releaseTime are retained.
 void shutdown()
          Polite shutdown.
 void shutdownNow()
          Fast shutdown (not immediate since it must abort active transactions).
protected  long singlePhaseCommit(AbstractTransactionService.TxState state)
          Prepare and commit a read-write transaction that has written on a single data service.
 void snapshot()
          Runs the SnapshotTask once.
 DistributedTransactionService start()
          Verifies that AbstractTransactionService.nextTimestamp() will not report a time before AbstractTransactionService.getLastCommitTime() and then changes the TxServiceRunState to TxServiceRunState.Running.
 
Methods inherited from class com.bigdata.service.AbstractTransactionService
abort, activateTx, assertOpen, assignTransactionIdentifier, commit, deactivateTx, declareResources, findUnusedTimestamp, getAbortCount, getActiveCount, getCommitCount, getMinReleaseAge, getProperties, getReadOnlyActiveCount, getReadWriteActiveCount, getReleaseTime, getRunState, getServiceIface, getStartCount, getStartTime, isOpen, newTx, nextTimestamp, setRunState, updateReleaseTime
 
Methods inherited from class com.bigdata.service.AbstractService
clearLoggingContext, getFederation, getHostname, getServiceName, getServiceUUID, setServiceUUID, setupLoggingContext
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.bigdata.service.IService
getHostname, getServiceName, getServiceUUID
 

Field Detail

commitTimeIndex

protected final CommitTimeIndex commitTimeIndex
A BTree containing a log of the historical commit points.

The main things that it gives us are (a) the half-open ranges within which we can allocate read-historical transactions; and (b) the last commit time on record. It seems that creating an image of the log every N seconds should be sufficient.

Note: Read and write operations on this index MUST be synchronized on the index object.


dataDir

protected final File dataDir
The data directory -or- null iff the service is transient.


BASENAME

protected static final String BASENAME
Basename for the files written in the dataDir containing images of the commitTimeIndex.

See Also:
Constant Field Values

EXT

protected static final String EXT
Extension for the files written in the dataDir containing snapshots of the commitTimeIndex.

See Also:
Constant Field Values
Constructor Detail

DistributedTransactionService

public DistributedTransactionService(Properties properties)
Parameters:
properties -
Method Detail

snapshot

public void snapshot()
Runs the SnapshotTask once.


start

public DistributedTransactionService start()
Description copied from class: AbstractTransactionService
Verifies that AbstractTransactionService.nextTimestamp() will not report a time before AbstractTransactionService.getLastCommitTime() and then changes the TxServiceRunState to TxServiceRunState.Running.

Overrides:
start in class AbstractTransactionService
Returns:
this (the return type should be strengthened by the concrete implementation to return the actual type).

addScheduledTasks

protected void addScheduledTasks()
Adds the scheduled tasks.


shutdown

public void shutdown()
Description copied from class: AbstractTransactionService
Polite shutdown. New transactions will not start. This method will block until existing transactions (both read-write and read-only) are complete (either aborted or committed).

Specified by:
shutdown in interface IServiceShutdown
Overrides:
shutdown in class AbstractTransactionService

shutdownNow

public void shutdownNow()
Description copied from class: AbstractTransactionService
Fast shutdown (not immediate since it must abort active transactions).

New transactions will not start and active transactions will be aborted. Transactions which are concurrently committing MAY fail (throwing exceptions from various methods, including AbstractTransactionService.nextTimestamp()) when the service halts.

Specified by:
shutdownNow in interface IServiceShutdown
Overrides:
shutdownNow in class AbstractTransactionService

destroy

public void destroy()
Description copied from class: AbstractTransactionService
Immediate/fast shutdown of the service and then destroys any persistent state associated with the service.

Specified by:
destroy in interface IService
Overrides:
destroy in class AbstractTransactionService

setReleaseTime

protected void setReleaseTime(long releaseTime)
Extended to truncate the head of the commitTimeIndex such only the commit times requires for reading on timestamps GTE to the new releaseTime are retained.

Overrides:
setReleaseTime in class AbstractTransactionService
Parameters:
releaseTime - The new value.

getDataServices

protected ITxCommitProtocol[] getDataServices(UUID[] uuids)
Return the proxies for the services participating in a distributed transaction commit or abort.

Note: This method is here so that it may be readily overriden for unit tests.

Parameters:
uuids - The UUIDs of the participating services.
Returns:
The corresponding service proxies.

abortImpl

protected void abortImpl(AbstractTransactionService.TxState state)
                  throws Exception
Description copied from class: AbstractTransactionService
Implementation must abort the tx on the journal (standalone) or on each data service (federation) on which it has written.

Pre-conditions:

  1. The transaction is RunState.Active; and
  2. The caller holds the AbstractTransactionService.TxState.lock.

Post-conditions:

  1. The transaction is RunState.Aborted; and
  2. The transaction write set has been discarded by each Journal or IDataService or which it has written (applicable for read-write transactions only).

Specified by:
abortImpl in class AbstractTransactionService
Parameters:
state - The transaction state as maintained by the transaction server.
Throws:
Exception

commitImpl

protected long commitImpl(AbstractTransactionService.TxState state)
                   throws Exception
There are two distinct commit protocols depending on whether the transaction write set is distributed across more than one IDataService. When write set of the transaction lies entirely on a single IDataService, an optimized commit protocol is used. When the write set of the transaction is distributed, a 3-phase commit is used with most of the work occurring during the "prepare" phase and a very rapid "commit" phase. If a distributed commit fails, even during the "commit", then the transaction will be rolled back on all participating IDataServices.

Single phase commits

A simple commit protocol is used when the write set of the transaction resides entirely on a single IDataService. Such commits DO NOT contend for named resource locks (either on the index names or on the IDataService UUIDs). Since such transactions DO NOT have dependencies outside of the specific IDataService, a necessary and sufficient partial order will be imposed on the executing tasks locally by the IDataService on which they are executing based solely on the named resources which they declare. Without dependencies on distributed resources, this can not deadlock.

Distributed commits

Transaction commits for a distributed database MUST be prepared in a partial order so that they do not deadlock when acquiring the necessary locks on the named indices on the local data services. That partial order is imposed using the indexLockManager. The named index locks are pre-declared at the start of the distributed commit protocol and are held through both the prepare and commit phases until the end of the commit protocol. The distributed commit must obtain a lock on all of the necessary named index resources before proceeding. If there is an existing commit using some of those resources, then any concurrent commit requiring any of those resources will block. The LockManager is configured to require pre-declaration of locks. Deadlocks are NOT possible when the locks are pre-declared.

A secondary partial ordering is established based on the IDataService UUIDs during the commit phase. This partial order is necessary to avoid deadlocks for concurrently executing commit phases of distributed transactions that DO NOT share named index locks. Without a partial order over the participating IDataServices, deadlocks could arise because each transaction will grab an exclusive lock on the write service for each participating IDataService. By ordering those lock requests, we again ensure that deadlocks can not occur.

Note: The prepare phase for distributed commits allows the maximum possible concurrency. This is especially important as validation and merging down onto the unisolated indices can have significant length for large transactions.

The commit phase should be very fast, with syncing the disk providing the primary source of latency. All participating indices on the participating data services have already been checkpointed. Once the commitTime is assigned by the DistributedTransactionService, the group commit need only update the root block on the live journal and sync to disk.

Specified by:
commitImpl in class AbstractTransactionService
Returns:
The commit time for the transaction -or- ZERO (0L) if the transaction was read-only or had an empty write set.
Throws:
Exception - if something else goes wrong. This will be (or will wrap) a ValidationError if validation fails.
TODO:
Place timeout on the commit phase where the tx will abort unless all participants join at the "committed" barrier within ~ 250ms. That should be a generous timeout, but track aborts for this reason specifically since they may indicate interesting problems (heavy swapping, network issues, etc)., make sure that we checkpoint the commit record index and Name2Addr before requesting the commitTime to remove even more latency.

singlePhaseCommit

protected long singlePhaseCommit(AbstractTransactionService.TxState state)
                          throws Exception
Prepare and commit a read-write transaction that has written on a single data service.

Throws:
Exception

prepared

public long prepared(long tx,
                     UUID dataService)
              throws IOException,
                     InterruptedException,
                     BrokenBarrierException
Waits at "prepared" barrier. When the barrier breaks, examing the TxState. If the transaction is aborted, then throw an InterruptedException. Otherwise return the commitTime assigned to the transaction.

Parameters:
tx - The transaction identifier.
dataService - The UUID of the IDataService which sent the message.
Returns:
The assigned commit time.
Throws:
InterruptedException - if the barrier is reset while the caller is waiting.
IOException - if there is an RMI problem.
BrokenBarrierException

committed

public boolean committed(long tx,
                         UUID dataService)
                  throws IOException,
                         InterruptedException,
                         BrokenBarrierException
Wait at "committed" barrier. When the barrier breaks, examing the TxState. If the transaction is aborted, then return false. Otherwise return true.

Note: The TxState will be aborted if any of the committers throws an exception of their ITxCommitProtocol.prepare(long, long) method.

Parameters:
tx - The transaction identifier.
dataService - The UUID of the IDataService which sent the message.
Returns:
true if the distributed commit was successfull and false if there was a problem.
Throws:
IOException
InterruptedException
BrokenBarrierException

findCommitTime

protected long findCommitTime(long timestamp)
Description copied from class: AbstractTransactionService
Find the commit time from which the tx will read (largest commitTime LTE timestamp).

Specified by:
findCommitTime in class AbstractTransactionService
Parameters:
timestamp - The timestamp.
Returns:
The commit time and -1L if there is no such commit time.

findNextCommitTime

protected long findNextCommitTime(long commitTime)
Description copied from class: AbstractTransactionService
Return the commit time for the successor of that commit point have the specified timestamp (a commit time strictly GT the given value).

Specified by:
findNextCommitTime in class AbstractTransactionService
Parameters:
commitTime - The probe.
Returns:
The successor or -1L iff the is no successor for that commit time.

notifyCommit

public final void notifyCommit(long commitTime)
Description copied from class: AbstractTransactionService
The basic implementation advances the release time periodically as commits occur even when there are no transactions in use.

Note: This needs to be a fairly low-latency operation since this method is invoked for all commits on all data services and will otherwise be a global hotspot.

Specified by:
notifyCommit in interface ITransactionService
Overrides:
notifyCommit in class AbstractTransactionService
Parameters:
commitTime - The commit time.
TODO:
Is it a problem if the commit notices do not arrive in sequence? Because they will not. Unisolated operations will participate in group commits using timestamps obtained from the transaction service, but those commit operations will not be serialize and their reporting of the timestamps for the commits will likewise not be serialized.

The danger is that we could assign a read-historical transaction start time based on the commitTimeIndex and then have commit timestamps arrive that are within the interval in which we made the assignment. Essentially, our interval was too large and the assigned start time may have been on either side of a concurrent commit. However, this can only occur for unisolated operations (non-transactional commits). The selected timestamp will always be coherent with respect to transaction commits since those are coordinated and use a shared commit time.

This issue can only arise when requesting historical reads for timestamps that are "close" to the most recent commit point since the latency involve would otherwise not effect the assignment of transaction start times. However, it can occur either when specifying the symbolic constant ITx.READ_COMMITTED to AbstractTransactionService.newTx(long) or when specifying the exact commitTime reported by a transaction commit.

Simply stated, there is NO protection against concurrently unisolated operations committing. If such operations are used on the same indices as transactions, then it IS possible that the application will be unable to read from exactly the post-commit state of the transaction for a brief period (10s of milliseconds) until the unisolated commit notices have been propagated to the DistributedTransactionService. This issue will only occur when there is also a lot of contention for reading on the desired timestamp since otherwise the commitTime itself may be used as a transaction start time., depending on the latency involved and the issue described immediately above, it might be possible to simply queue these notices and consume them in an async thread. some operations (such as a distributed commit) might require that we catch up on the commit time notices in the queue. just thinking out loud here.


getLastCommitTime

public final long getLastCommitTime()
Description copied from class: AbstractTransactionService
Note: Declared abstract so that we can hide the IOException.

Specified by:
getLastCommitTime in interface ITransactionService
Specified by:
getLastCommitTime in class AbstractTransactionService
Returns:
The last known commit time.

getCounters

public CounterSet getCounters()
Adds counters for the LockManager.

Overrides:
getCounters in class AbstractTransactionService


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.