com.bigdata.service
Interface IDataService

All Superinterfaces:
IRemoteExecutor, IService, ITxCommitProtocol, Remote
All Known Subinterfaces:
IMetadataService
All Known Implementing Classes:
AbstractEmbeddedDataService, DataServer.AdministrableDataService, DataService, EmbeddedFederation.EmbeddedDataServiceImpl, EmbeddedMetadataService, LocalDataServiceFederation.LocalDataServiceImpl, MetadataServer.AdministrableMetadataService, MetadataService

public interface IDataService
extends ITxCommitProtocol, IService, IRemoteExecutor

The data service interface provides remote access to named indices, provides for both unisolated and isolated operations on those indices, and exposes the ITxCommitProtocol interface to the ITransactionManagerService service for the coordination of distributed transactions. Clients normally write to the IIndex interface. The ClientIndexView provides an implementation of that interface supporting range partitioned scale-out indices which transparently handles lookup of data services in the metadata index and mapping of operations across the appropriate data services.

Indices are identified by name. Scale-out indices are broken into index partitions, each of which is a named index hosted on a data service. The name of an index partition is given by DataService.getIndexPartitionName(String, int). Clients are strongly encouraged to use the ClientIndexView which encapsulates lookup and distribution of operations on range partitioned scale-out indices.

The data service exposes both fully isolated read-write transactions, read-only transactions, lightweight read-historical operations, and unisolated operations on named indices. These choices are captured by the timestamp associated with the operation. When it is a transaction, this is also known as the transaction identifier or tx. The following distinctions are available:

Unisolated

Unisolated operation specify ITx.UNISOLATED as their transaction identifier. Unisolated operations are ACID, but their scope is limited to the commit group on the data service where the operation is executed. Unisolated operations correspond more or less to read-committed semantics except that writes are immediately visible to other operations in the same commit group.

Unisolated operations that allow writes obtain an exclusive lock on the live version of the named index for the duration of the operation. Unisolated operations that are declared as read-only read from the last committed state of the named index and therefore do not compete with read-write unisolated operations. This allows unisolated read operations to achieve higher concurrency. The effect is as if the unisolated read operation runs before the unisolated writes in a given commit group since the impact of those writes are not visible to unisolated readers until the next commit point.

Unisolated write operations MAY be used to achieve "auto-commit" semantics when distributed transactions are not required. Fully isolated transactions are useful when multiple operations must be composed into a ACID unit.

While unisolated operations on a single data service are ACID, clients generally operate against scale-out indices having multiple index partitions hosted on multiple data services. Therefore client MUST NOT assume that an unisolated operation described by the client against a scale-out index will be ACID when that operation is distributed across the various index partitions relevant to the client's request. In practice, this means that contract for ACID unisolated operations is limited to either: (a) operations where the data is located on a single data service instance; or (b) unisolated operations that are inherently designed to achieve a consistent result. Sometimes it is sufficient to configure a scale-out index such that index partitions never split some logical unit - for example, the {schema + primaryKey} for a SparseRowStore, thereby obtaining an ACID guarentee since operations on a logical row will always occur within the same index partition.

Light weight historical reads
Historical reads are indicated using tx, where tx is a timestamp and is associated with the closest commit point LTE to the timestamp. A historical read is fully isolated but has very low overhead and does NOT require the caller to open the transaction. The read will have a consistent view of the data as of the most recent commit point not greater than tx. Unlike a distributed read-only transaction, a historical read does NOT impose a distributed read lock. While the operation will have access to the necessary resources on the local data service, it is possible that resources for the same timestamp will be concurrently released on other data services. If you need to map a read operation across the distributed database, the you must use a read only transaction which will assert the necessary read-lock.
Distributed transactions
Distributed transactions are coordinated using an ITransactionManagerService service and incur more overhead than both unisolated and historical read operations. Transactions are assigned a start time (the transaction identifier) when they begin and must be explicitly closed by either an abort or a commit. Both read-only and read-write transactions assert read locks which force the retention of resources required for a consistent view as of the transaction start time until the transaction is closed.

Implementations of this interface MUST be thread-safe. Methods declared by this interface MUST block for each operation. Client operations SHOULD be buffered by a thread pool with a FIFO policy so that client requests may be decoupled from data service operations and clients may achieve greater parallelism.

Index Partitions: Split, Join, and Move

Scale-out indices are broken tranparently down into index partitions. When a scale-out index is initially registered, one or more index partitions are created and registered on one or more data services.

Note that each index partitions is just an IIndex registered under the name assigned by DataService.getIndexPartitionName(String, int) and whose IndexMetadata.getPartitionMetadata() returns a description of the resources required to compose a view of that index partition from the resources located on a DataService. The IDataService will respond for that index partition IFF there is an index under that name registered on the IDataService as of the timestamp associated with the request. If the index is not registered then a NoSuchIndexException will be thrown. If the index was registered and has since been split, joined or moved then a StaleLocatorException will be thrown (this will occur only for index partitions of scale-out indices). All methods on this and derived interfaces which are defined for an index name and timestamp MUST conform to these semantics.

As index partitions grow in size they may be split into 2 or more index partitions covering the same key range as the original index partition. When this happens a new index partition identifier is assigned by the metadata service to each of the new index partitions and the old index partition is retired in an atomic operation. A similar operation can move an index partition to a different IDataService in order to load balance a federation. Finally, when two index partitions shrink in size, they maybe moved to the same IDataService and an atomic join operation may re-combine them into a single index partition spanning the same key range.

Split, join, and move operations all result in the old index partition being dropped on the IDataService. Clients having a stale PartitionLocator record will attempt to reach the now defunct index partition after it has been dropped and will receive a StaleLocatorException.

StaleLocatorException

IDataService clients MUST handle this exception by refreshing their cached PartitionLocator for the key range associated with the index partition which they wish to query and then re-issuing their request. By following this simple rule the client will automatically handle index partition splits, joins, and moves without error and in a manner which is completely transparent to the application. Note that splits, joins, and moves DO NOT alter the PartitionLocator for historical reads, only for ongoing writes. This exception is generally (but not always) wrapped. Applications typically DO NOT write directly to the IDataService interface and therefore DO NOT need to worry about this. See ClientIndexView, which automatically handles this exception.

IOException

All methods on this and derived interfaces can throw an IOException. In all cases an unwrapped exception that is an instance of IOException indicates an error in the Remote Method Invocation (RMI) layer.

ExecutionException and InterruptedException

An unwrapped ExecutionException or InterruptedException indicates a problem when running the request as a task in the IConcurrencyManager on the IDataService. The exception always wraps a root cause which may indicate the underlying problem. Methods which do not declare these exceptions are not run under the IConcurrencyManager.

Version:
$Id: IDataService.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
TODO:
add support for triggers? unisolated triggers must be asynchronous if they will take actions with high latency (such as writing on a different index partition, which could be remote). Low latency actions might include emitting asynchronous messages. transactional triggers can have more flexibility since they are under less of a latency constraint.

Method Summary
 void dropIndex(String name)
          Drops the named index.
 void forceOverflow(boolean immediate, boolean compactingMerge)
          Method sets a flag that will force overflow processing during the next group commit and optionally forces a group commit.
 long getAsynchronousOverflowCounter()
          The #of asynchronous overflows that have taken place on this data service (the counter is not restart safe).
 IndexMetadata getIndexMetadata(String name, long timestamp)
          Return the metadata for the named index.
 boolean isOverflowActive()
          Return true iff the data service is currently engaged in overflow processing.
 boolean purgeOldResources(long timeout, boolean truncateJournal)
          This attempts to pause the service accepting ITx.UNISOLATED writes and then purges any resources that are no longer required based on the StoreManager.Options#MIN_RELEASE_AGE.
 ResultSet rangeIterator(long tx, String name, byte[] fromKey, byte[] toKey, int capacity, int flags, IFilterConstructor filter)
           Streaming traversal of keys and/or values in a key range.
 IBlock readBlock(IResourceMetadata resource, long addr)
          Read a low-level record from the described IRawStore described by the IResourceMetadata.
 void registerIndex(String name, IndexMetadata metadata)
          Register a named mutable index on the DataService.
 Future<? extends Object> submit(Callable<? extends Object> proc)
          Submit a Callable and return its Future.
 Future submit(long tx, String name, IIndexProcedure proc)
           Submit a procedure.
 
Methods inherited from interface com.bigdata.service.ITxCommitProtocol
abort, prepare, setReleaseTime, singlePhaseCommit
 
Methods inherited from interface com.bigdata.service.IService
destroy, getHostname, getServiceIface, getServiceName, getServiceUUID
 

Method Detail

registerIndex

void registerIndex(String name,
                   IndexMetadata metadata)
                   throws IOException,
                          InterruptedException,
                          ExecutionException
Register a named mutable index on the DataService.

Note: In order to register an index partition the partition metadata property MUST be set. The resources property will then be overriden when the index is actually registered so as to reflect the IResourceMetadata description of the journal on which the index actually resides.

Parameters:
name - The name that can be used to recover the index. In order to create a partition of an index you must form the name of the index partition using DataService.getIndexPartitionName(String, int) (this operation is generally performed by the IMetadataService which manages scale-out indices).
metadata - The metadata describing the index.

The LocalPartitionMetadata.getResources() property on the IndexMetadata.getPartitionMetadata() SHOULD NOT be set. The correct IResourceMetadata[] will be assigned when the index is registered on the IDataService.

Throws:
IOException
InterruptedException
ExecutionException
TODO:
exception if index exists? or modify to validate consistent decl and exception iff not consistent. right now it just silently succeeds if the index already exists.

getIndexMetadata

IndexMetadata getIndexMetadata(String name,
                               long timestamp)
                               throws IOException,
                                      InterruptedException,
                                      ExecutionException
Return the metadata for the named index.

Parameters:
name - The index name.
timestamp - A transaction identifier, ITx.UNISOLATED for the unisolated index view, ITx.READ_COMMITTED, or timestamp for a historical view no later than the specified timestamp.
Returns:
The metadata for the named index.
Throws:
IOException
InterruptedException
ExecutionException

dropIndex

void dropIndex(String name)
               throws IOException,
                      InterruptedException,
                      ExecutionException
Drops the named index.

Note: In order to drop a partition of an index you must form the name of the index partition using DataService.getIndexPartitionName(String, int) (this operation is generally performed by the IMetadataService which manages scale-out indices).

Parameters:
name - The index name.
Throws:
IllegalArgumentException - if name does not identify a registered index.
IOException
InterruptedException
ExecutionException

rangeIterator

ResultSet rangeIterator(long tx,
                        String name,
                        byte[] fromKey,
                        byte[] toKey,
                        int capacity,
                        int flags,
                        IFilterConstructor filter)
                        throws InterruptedException,
                               ExecutionException,
                               IOException

Streaming traversal of keys and/or values in a key range.

Note: In order to visit all keys in a range, clients are expected to issue repeated calls in which the fromKey is incremented to the successor of the last key visited until either an empty ResultSet is returned or the ResultSet#isLast() flag is set, indicating that all keys up to (but not including) the startKey have been visited. See ClientIndexView (scale-out indices) and DataServiceTupleIterator (unpartitioned indices), both of which encapsulate this method.

Note: If the iterator can be determined to be read-only and it is submitted as ITx.UNISOLATED then it will be run as ITx.READ_COMMITTED to improve concurrency.

Parameters:
tx - The transaction identifier -or- ITx.UNISOLATED IFF the operation is NOT isolated by a transaction -or- - tx to read from the most recent commit point not later than the absolute value of tx (a fully isolated read-only transaction using a historical start time).
name - The index name (required).
fromKey - The starting key for the scan (or null iff there is no lower bound).
toKey - The first key that will not be visited (or null iff there is no upper bound).
capacity - When non-zero, this is the maximum #of entries to process.
flags - One or more flags formed by bitwise OR of zero or more of the constants defined by IRangeQuery.
filter - An optional object that may be used to layer additional semantics onto the iterator. The filter will be constructed on the server and in the execution context for the iterator, so it will execute directly against the index for the maximum efficiency.
Throws:
InterruptedException - if the operation was interrupted.
ExecutionException - If the operation caused an error. See Throwable.getCause() for the underlying error.
IOException

submit

Future submit(long tx,
              String name,
              IIndexProcedure proc)
              throws IOException

Submit a procedure.

Unisolated operations SHOULD be used to achieve "auto-commit" semantics. Fully isolated transactions are useful IFF multiple operations must be composed into a ACID unit.

While unisolated batch operations on a single data service are ACID, clients are required to locate all index partitions for the logical operation and distribute their operation across the distinct data service instances holding the affected index partitions. In practice, this means that contract for ACID unisolated operations is limited to operations where the data is located on a single data service instance. For ACID operations that cross multiple data service instances the client MUST use a fully isolated transaction. While read-committed transactions impose low system overhead, clients interested in the higher possible total throughput SHOULD choose unisolated read operations in preference to a read-committed transaction.

Parameters:
tx - The transaction identifier, ITx.UNISOLATED for an ACID operation NOT isolated by a transaction, ITx.READ_COMMITTED for a read-committed operation not protected by a transaction (no global read lock), or any valid commit time for a read-historical operation not protected by a transaction (no global read lock).
name - The name of the index partition.
proc - The procedure to be executed.
Returns:
The Future from which the outcome of the procedure may be obtained.
Throws:
RejectedExecutionException - if the task can not be accepted for execution.
IOException - if there is an RMI problem.
TODO:
change API to Future submit(tx,name,IIndexProcedure). Existing code will need to be recompiled after this API change.

submit

Future<? extends Object> submit(Callable<? extends Object> proc)
                                throws RemoteException
Submit a Callable and return its Future. The Callable will execute on the IBigdataFederation.getExecutorService().

Note: This interface is specialized by the IDataService for tasks which need to gain access to the IDataService in order to gain local access to index partitions, etc. Such tasks declare the IDataServiceCallable. For example, scale-out joins use this mechanism.

Specified by:
submit in interface IRemoteExecutor
Returns:
The Future for that task.
Throws:
RemoteException
See Also:
IDataServiceCallable

readBlock

IBlock readBlock(IResourceMetadata resource,
                 long addr)
                 throws IOException
Read a low-level record from the described IRawStore described by the IResourceMetadata.

Parameters:
resource - The description of the resource containing that block.
addr - The address of the block in that resource.
Returns:
An object that may be used to read the block from the data service.
Throws:
IllegalArgumentException - if the resource is null
IllegalArgumentException - if the addr is 0L
IllegalStateException - if the resource is not available.
IllegalArgumentException - if the record identified by addr can not be read from the resource.
IOException
TODO:
This is a first try at adding support for reading low-level records from a journal or index segment in support of the BigdataFileSystem.

The API should provide a means to obtain a socket from which record data may be streamed. The client sends the resource identifier (UUID of the journal or index segment) and the address of the record and the data service sends the record data. This is designed for streaming reads of up to 64M or more (a record recorded on the store as identified by the address).


forceOverflow

void forceOverflow(boolean immediate,
                   boolean compactingMerge)
                   throws IOException,
                          InterruptedException,
                          ExecutionException
Method sets a flag that will force overflow processing during the next group commit and optionally forces a group commit. Normally there is no reason to invoke this method directly. Overflow processing is triggered automatically on a bottom-up basis when the extent of the live journal nears the Options.MAXIMUM_EXTENT.

Parameters:
immediate - The purpose of this argument is to permit the caller to trigger an overflow event even though there are no writes being made against the data service. When true the method will write a token record on the live journal in order to provoke a group commit. In this case synchronous overflow processing will have occurred by the time the method returns. When false a flag is set and overflow processing will occur on the next commit.
compactingMerge - The purpose of this flag is to permit the caller to indicate that a compacting merge should be performed for all indices on the data service (at least, all indices whose data are not simply copied onto the new journal) during the next synchronous overflow. Note that compacting merges of indices are performed automatically from time to time so this flag exists mainly for people who want to force a compacting merge for some reason.
Throws:
IOException
InterruptedException - may be thrown if immediate is true.
ExecutionException - may be thrown if immediate is true.

purgeOldResources

boolean purgeOldResources(long timeout,
                          boolean truncateJournal)
                          throws IOException,
                                 InterruptedException
This attempts to pause the service accepting ITx.UNISOLATED writes and then purges any resources that are no longer required based on the StoreManager.Options#MIN_RELEASE_AGE.

Note: Resources are normally purged during synchronous overflow handling. However, asynchronous overflow handling can cause resources to no longer be needed as new index partition views are defined. This method MAY be used to trigger a release before the next overflow event.

Parameters:
timeout - The timeout (in milliseconds) that the method will await the pause of the write service.
truncateJournal - When true, the live journal will be truncated to its minimum extent (all writes will be preserved but there will be no free space left in the journal). This may be used to force the DataService to its minimum possible footprint for the configured history retention policy.
truncateJournal - When true the live journal will be truncated such that no free space remains in the journal.
Returns:
true if successful and false if the write service could not be paused after the specified timeout.
Throws:
IOException
InterruptedException

getAsynchronousOverflowCounter

long getAsynchronousOverflowCounter()
                                    throws IOException
The #of asynchronous overflows that have taken place on this data service (the counter is not restart safe).

Throws:
IOException

isOverflowActive

boolean isOverflowActive()
                         throws IOException
Return true iff the data service is currently engaged in overflow processing.

Throws:
IOException


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.