com.bigdata.journal
Interface IRootBlockView

All Known Implementing Classes:
RootBlockView

public interface IRootBlockView

Interface for a root block on the journal. The root block provides metadata about the journal. The journal has two root blocks. The root blocks are written in an alternating order according to the Challis algorithm. Each root block includes a field at the head and tail whose value is strictly increasing fields. This field is often referred to as a root block "timestamps", but in practice we use the commit counter. On restart, the root block is choosen whose (a) strictly increasing fields agree; and (b) whose value on those fields is greater. This protected against both crashes and partial writes of the root block itself.

The commit counter is a store local strictly increasing non-negative long integer (commit counters are distinct for each store regardless of whether they are part of the same distributed database). The commit counters MUST be strictly increasing (a) so that they place the commit records into a total ordering; (b) so that the more current root block may be choose by comparing the value of the field in each of the two root blocks; and (c) so that a partial write of a root block may be detected by the presence of different values for the field at the head and tail of a given root block. The commit counter is also used as the field written at the head and tail of each root block according to the Challis algorithm. If those fields are the same then the root block is assumed to have been completely written.

Note that random data may still result in an identical value during a partial write. This possibility is guarded against by storing the checksum of the root block.

The first and last commit times are persisted in each root block in order to support both unisolated commits and transactions, whether in a local or a distributed database. These "times" are generated by the appropriate ITransactionManagerService service, which is responsible both for assigning transaction start times (which are in fact the transaction identifier) and transaction commit times, which are stored in root blocks of the various stored that participate in a given database and reported via getFirstCommitTime() and getLastCommitTime(). While these do not strictly speaking have to be "times" they do have to be assigned using the same measure as the transaction identifiers, so either a coordinated time server or a strictly increasing counter. Regardless, we need to know "when" a transaction commits as well as "when" it starts whether we measure "when" using a counter or a clock. Also note that we need to assign "commit times" even when the operation is unisolated. This means that we have to coordinate an unisolated commit on a store that is part of a distributed database with the centralized transaction manager. This should be done as part of the group commit since we are waiting at that point anyway to optimize IO by minimizing syncs to disk.

Note that some file systems or disks can re-order writes of by the application and write the data in a more efficient order. This can cause the root blocks to be written before the application data is stable on disk. The Options.DOUBLE_SYNC option exists to defeat this behavior and ensure restart-safety for such systems.

Version:
$Id: IRootBlockView.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson

Method Summary
 ByteBuffer asReadOnlyBuffer()
          A read-only buffer whose contents are the root block.
 long getCloseTime()
          The timestamp assigned as the time at which writes were disallowed for the journal.
 long getCommitCounter()
          The commit counter is a positive long integer that is strictly local to the store.
 long getCommitRecordAddr()
          Return the address at which the ICommitRecord for this root block is stored.
 long getCommitRecordIndexAddr()
          The address of the root of the CommitRecordIndex.
 long getCreateTime()
          The timestamp assigned as the creation time for the journal.
 long getFirstCommitTime()
          The database wide timestamp of first commit on the store -or- 0L if there have been no commits.
 long getLastCommitTime()
          The database wide timestamp of the most recent commit on the store or 0L iff there have been no commits.
 long getNextOffset()
          The next offset at which a data item would be written on the store.
 int getOffsetBits()
          The #of bits in a 64-bit long integer address that are dedicated to the byte offset into the store.
 UUID getUUID()
          The unique journal identifier
 int getVersion()
          The root block version number.
 boolean isRootBlock0()
          There are two root blocks and they are written in an alternating order.
 void valid()
          Assertion throws exception unless the root block is valid.
 

Method Detail

valid

void valid()
           throws RootBlockException
Assertion throws exception unless the root block is valid. Conditions tested include the root block MAGIC and the root block timestamps (there are two and they must agree).

Throws:
RootBlockException

isRootBlock0

boolean isRootBlock0()
There are two root blocks and they are written in an alternating order. For the sake of distinction, the first one is referred to as "rootBlock0" while the 2nd one is referred to as "rootBlock1". This method indicates which root block is represented by this view based on metadata supplied to the constructor (the distinction is not persistent on disk).

Returns:
True iff the root block view was constructed from "rootBlock0".

getVersion

int getVersion()
The root block version number.


getNextOffset

long getNextOffset()
The next offset at which a data item would be written on the store.


getFirstCommitTime

long getFirstCommitTime()
The database wide timestamp of first commit on the store -or- 0L if there have been no commits. In a local database, this timestamp is generated by a local timestamp service. In a distributed database, this timestamp is generated by a shared timestamp service. The timestamps returned by this method are strictly increasing for a given store and for a given database.

Returns:
The timestamp of the first commit on the store or 0L iff there have been no commits.

getLastCommitTime

long getLastCommitTime()
The database wide timestamp of the most recent commit on the store or 0L iff there have been no commits. In a local database, this timestamp is generated by a local timestamp service. In a distributed database, this timestamp is generated by a shared timestamp service. The timestamps returned by this method are strictly increasing for a given store and for a given database.

Returns:
The timestamp of the most recent commit on the store or 0L iff there have been no commits.

getCommitCounter

long getCommitCounter()
The commit counter is a positive long integer that is strictly local to the store. The commit counter is used to avoid problems with timestamps generated by different machines or when time goes backwards or other nasty stuff. The correct root block is choosen by selecting the valid root block with the larger commit counter (the value of the commit counter is reused by the Challis field).

Returns:
The commit counter.

getCommitRecordAddr

long getCommitRecordAddr()
Return the address at which the ICommitRecord for this root block is stored. The ICommitRecords are stored separately from the root block so that they may be indexed by the commit timestamps. This is necessary in order to be able to quickly recover the root addresses for a given commit timestamp, which is a featured used to support transactional isolation.

Note: When a logical journal may overflow onto more than one physical journal then the address of the ICommitRecord MAY refer to a historical physical journal and care MUST be exercised to resolve the address against the appropriate journal file.

Returns:
The address at which the ICommitRecord for this root block is stored.

getCommitRecordIndexAddr

long getCommitRecordIndexAddr()
The address of the root of the CommitRecordIndex. The CommitRecordIndex contains the ordered addresses of the historical ICommitRecords on the Journal. The address of the CommitRecordIndex is stored directly in the root block rather than the ICommitRecord since we can not obtain this address until after we have formatted and written the ICommitRecord.


getUUID

UUID getUUID()
The unique journal identifier


getOffsetBits

int getOffsetBits()
The #of bits in a 64-bit long integer address that are dedicated to the byte offset into the store.

See Also:
WormAddressManager

getCreateTime

long getCreateTime()
The timestamp assigned as the creation time for the journal.


getCloseTime

long getCloseTime()
The timestamp assigned as the time at which writes were disallowed for the journal.


asReadOnlyBuffer

ByteBuffer asReadOnlyBuffer()
A read-only buffer whose contents are the root block. The position, limit, and mark will be independent for each ByteBuffer that is returned by this method.



Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.