Package com.bigdata.journal

The journal is an append-only persistence capable data structure supporting atomic commit, named indices, and transactions.

See:
          Description

Interface Summary
BufferedDiskStrategy.Options Options for the BufferedDiskStrategy.
ConcurrencyManager.IConcurrencyManagerCounters Interface defines and documents the counters and counter namespaces for the ConcurrencyManager.
ConcurrencyManager.Options Options for the ConcurrentManager.
IAtomicStore Interface for low-level operations on a store supporting an atomic commit.
IBTreeManager Extended to allow direct registration of a named BTree.
IBufferStrategy Interface for implementations of a buffer strategy as identified by a BufferMode.
ICommitRecord An interface providing a read-only view of a commit record.
ICommitter An interface implemented by a persistence capable data structure such as a btree so that it can participate in the commit protocol for the store.
IConcurrencyManager Interface for managing concurrent access to resources (indices).
IDiskBasedStrategy An interface for implementations backed by a file on disk.
IIndexManager Interface for managing named indices.
IIndexStore Interface accessing named indices.
IJournal An append-only persistence capable data structure supporting atomic commit, scalable named indices, and transactions.
ILocalTransactionManager Interface for managing local transaction state (the client side of the ITransactionService).
IResourceLock A lock granted by an IResourceLockService.
IResourceLockService Interface named synchronous distributed locks without deadlock detection.
IResourceManager Interface manging the resources on which indices are stored.
IRootBlockView Interface for a root block on the journal.
ITask<T> Interface available to tasks running under the ConcurrencyManager.
ITimestampService A service for unique timestamps.
ITransactionService An interface for managing transaction life cycles.
ITx Interface for transaction state on the client.
Journal.Options Options understood by the Journal.
Options Options for the Journal.
TemporaryStoreFactory.Options Configuration options for the TemporaryStoreFactory.
 

Class Summary
AbstractBufferStrategy Abstract base class for IBufferStrategy implementation.
AbstractJournal The journal is an append-only persistence capable data structure supporting atomic commit, named indices, and transactions.
AbstractLocalTransactionManager Manages the client side of a transaction either for a standalone Journal or for an IDataService in an IBigdataFederation.
AbstractTask<T> Abstract base class for tasks that may be submitted to the ConcurrencyManager.
AbstractTask.DelegateTask<T> Delegates various behaviors visible to the application code using the ITask interface to the AbstractTask object.
AbstractTask.InnerReadWriteTxServiceCallable Inner class used to wrap up the call to AbstractTask.doTask() for read-write transactions.
AbstractTask.InnerWriteServiceCallable<T> An instance of this class is used as the delegate for a LockManagerTask in order to coordinate the acquisition of locks with the LockManager before the task can execute and to release locks after the task has completed (whether it succeeds or fails).
BasicBufferStrategy Implements logic to read from and write on a buffer.
BufferedDiskStrategy A disk-based strategy where a large buffer is used to minimize the chance that a read will read through to the disk (under normal circumstances the on-disk file will be fully buffered).
CommitRecord A read-only view of an ICommitRecord.
CommitRecordIndex BTree mapping commit times to ICommitRecords.
CommitRecordIndex.CommitRecordIndexTupleSerializer Encapsulates key and value formation for the CommitRecordIndex.
CommitRecordIndex.Entry An entry in the persistent index.
CommitRecordIndex.Entry.EntrySerializer Used to (de-)serialize CommitRecordIndex.Entrys (NOT thread-safe).
CommitRecordSerializer A helper class for (de-)serializing the root addresses.
CompactJournalUtility Utility class to compact a Journal.
CompactTask Task compacts the journal state onto a caller specified file.
ConcurrencyManager Supports concurrent operations against named indices.
DirectBufferStrategy Direct buffer strategy uses a direct ByteBuffer as a write through cache and writes through to disk for persistence.
DiskBackedBufferStrategy Abstract base class for implementations that use a direct buffer as a write through cache to an image on the disk.
DiskOnlyStrategy Disk-based journal strategy.
DiskOnlyStrategy.StoreCounters Counters for IRawStore access, including operations that read or write through to the underlying media.
DropIndexTask Drop a named index (unisolated write operation).
DumpJournal A utility class that opens the journal in a read-only mode and dumps the root blocks and metadata about the indices on a journal file.
FileMetadata Helper object used when opening or creating journal file in any of the file-based modes.
IndexProcedureTask Class provides an adaptor allowing a IIndexProcedure to be executed on an IConcurrencyManager.
Journal Concrete implementation suitable for a local and unpartitioned database.
JournalTransactionService Implementation for a standalone journal using single-phase commits.
JournalTransactionService.SinglePhaseCommit This task is an UNISOLATED operation that validates and commits a transaction known to have non-empty write sets.
MappedBufferStrategy Memory-mapped journal strategy (this mode is NOT recommended).
Name2Addr Name2Addr is a BTree mapping index names to an Name2Addr.Entry containing the last Checkpoint record committed for the named index and the timestamp of that commit.
Name2Addr.Entry An entry in the persistent index.
Name2Addr.EntrySerializer The values are Name2Addr.Entrys.
Name2Addr.Name2AddrTupleSerializer Encapsulates key and value formation for Name2Addr.
RegisterIndexTask Register a named index (unisolated write operation).
ResourceLockService An implementation using NamedLocks suitable for within JVM locking.
RootBlockView A view onto a root block of the Journal.
TemporaryRawStore A non-restart-safe store for temporary data that buffers data in memory until the write cache overflows (or is flushed to the disk) and then converts to a disk-based store.
TemporaryStore A temporary store that supports named indices but no concurrency controls.
TemporaryStoreFactory Helper class for IIndexStore.getTempStore().
TimestampServiceUtil Robust request for a timestamp from an ITimestampService.
TimestampUtility Some static helper methods for timestamps.
TransientBufferStrategy Transient buffer strategy uses a direct buffer but never writes on disk.
Tx A read-write transaction.
WriteExecutorService A custom ThreadPoolExecutor used by the ConcurrencyManager to execute concurrent unisolated write tasks and perform group commits.
 

Enum Summary
BufferMode The buffer mode in which the journal is opened.
ForceEnum Type safe enumeration of options governing whether and how a file is forced to stable storage.
RunState Enum of transaction run states.
 

Exception Summary
AbstractTask.ResubmitException This is thrown if you attempt to reuse (re-submit) the same AbstractTask instance.
IndexExistsException  
NoSuchIndexException  
OverflowException An instance of this class is thrown if an AbstractBufferStrategy.overflow(long) request is denied.
RootBlockException An instance of this class is thrown if there is a problem with a root block (bad magic, unknown version, Challis fields do not agree, checksum error, etc).
ValidationError An instance of this class is thrown when a transaction ITx#prepare(long)s if there is a write-write conflict that can not be resolved.
 

Package com.bigdata.journal Description

The journal is an append-only persistence capable data structure supporting atomic commit, named indices, and transactions. Writes are logically appended to the journal to minimize disk head movement. The addressing scheme of the journal is configurable. The scale-up default allows individual journal that address up to 4 terabytes and allows records up to 4 megabytes in length. The scale-out default allows records of up to 64 megabytes in length, but the maximum file size is smaller. See the WormAddressManager for details. The journal supports the concept of "overflow", which is triggered when the journal exceeds a threshold extent. An implementation that handles overflow will expunge B+Trees from the journal onto read-optimized index segments, thereby creating a database of range partitioned indices. See the ResourceManager for overflow handling, which is part of the basic DataService.

The journal may be used as a Write Once Read Many (WORM) store. An index is maintained over the historical commit points for the store, so a journal may also be used as an immortal store in which all historical consistent states may be accessed. A read-write architecture may be realized by limiting the journal to 100s of megabytes in length. Both incrementally and when the journal overflows, key ranges of indices are expunged from the journal onto read-only index segments. When used in this manner, a consistent read on an index partition requires a fused view of the data on the journal and the data in the active index segments. Again, see ResourceManager for details.

The journal can be either wired into memory or accessed in place on disk. The advantage of wiring the journal into memory is that the disk head does not move for reads from the journal. Wiring the journal into memory can offer performance benefits if writes are to be absorbed in a persistent buffer and migrated to a read-optimized index segments. However, the B+Tree implementation provides caching for recently used nodes and leaves. In addition, wiring a journal limits the maximum size of the journal to less than 2 gigabytes (it must fit into the JVM memory pool) and can cause contention for system RAM.

The journal relies on the BTree to provide a persistent mapping from keys to values. When used as an object store, the B+Tree naturally clusters serialized objects within the leaves of the B+Tree based on their object identifier and provides IO efficiencies. (From the perspective of the journal, an object is a byte[] or byte stream.) If the database generalizes the concept of an object identifier to a variable length byte[], then the application can take control over the clustering behavior by simply choosing how to code a primary key for their objects.

When using journals with an overflow limit of a few 100MB, a few large records would cause the journal to overflow. In such cases the journal may be configured to have a larger maximum extent and therefore defer overflow. The scale-out file system is an example of such an application and records of (up to) 64M by default. The index for the file system only stores a reference to the raw record on the journal. During overflow processing, the record is replicated onto an index segment and the index entry in the index segment is modified during the build so that it has a reference to the replicated record.



Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.