com.bigdata.service
Class AbstractScaleOutFederation<T>

java.lang.Object
  extended by com.bigdata.service.AbstractFederation<T>
      extended by com.bigdata.service.AbstractScaleOutFederation<T>
Type Parameters:
T - The generic type of the client or service.
All Implemented Interfaces:
IIndexManager, IIndexStore, IBigdataFederation<T>, IFederationDelegate<T>
Direct Known Subclasses:
AbstractDistributedFederation, EmbeddedFederation

public abstract class AbstractScaleOutFederation<T>
extends AbstractFederation<T>

Abstract base class for federation implementations using the scale-out index architecture (federations that support key-range partitioned indices).

Version:
$Id: AbstractScaleOutFederation.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson

Nested Class Summary
static class AbstractScaleOutFederation.ForceOverflowTask
          Task forces immediate overflow of the specified data service, returning once both synchronous AND asynchronous overflow are complete.
static class AbstractScaleOutFederation.PurgeResourcesTask
          Task directs a DataService to purge any unused resources and to optionally truncate the extent of the live journal.
 
Nested classes/interfaces inherited from class com.bigdata.service.AbstractFederation
AbstractFederation.ReportTask, AbstractFederation.StartDeferredTasksTask
 
Field Summary
protected  AbstractScaleOutClient.MetadataIndexCachePolicy metadataIndexCachePolicy
           
 
Fields inherited from class com.bigdata.service.AbstractFederation
log
 
Constructor Summary
AbstractScaleOutFederation(IBigdataClient<T> client)
           
 
Method Summary
 UUID[] awaitServices(int minDataServices, long timeout)
          Await the availability of an IMetadataService and the specified minimum #of IDataServices.
 void forceOverflow(boolean truncateJournal)
          Force overflow of each data service in the scale-out federation (only scale-out federations support overflow processing).
 ClientIndexView getIndex(String name, long timestamp)
          Strengthens the return type.
protected  IndexCache getIndexCache()
          Return the cache for IIndex objects.
 IMetadataIndex getMetadataIndex(String name, long timestamp)
          Return a read-only view onto an IMetadataIndex.
protected  MetadataIndexCache getMetadataIndexCache()
          Return the cache for IMetadataIndex objects.
 boolean isScaleOut()
          Return true.
 Iterator<PartitionLocator> locatorScan(String name, long timestamp, byte[] fromKey, byte[] toKey, boolean reverseScan)
          Returns an iterator that will visit the PartitionLocators for the specified scale-out index key range.
 void shutdown()
          Normal shutdown allows any existing client requests to federation services to complete but does not schedule new requests, disconnects from the federation, and then terminates any background processing that is being performed on the behalf of the client (service discovery, etc).
 void shutdownNow()
          Immediate shutdown terminates any client requests to federation services, disconnects from the federation, and then terminate any background processing that is being performed on the behalf of the client (service discovery, etc).
 
Methods inherited from class com.bigdata.service.AbstractFederation
addScheduledTask, assertOpen, destroy, didStart, dropIndex, getClient, getCounterSet, getDataServices, getExecutorService, getGlobalFileSystem, getGlobalRowStore, getHostCounterSet, getHttpdURL, getIndexCounters, getResourceLocator, getScheduledExecutorService, getService, getServiceCounterPathPrefix, getServiceCounterPathPrefix, getServiceCounterSet, getServiceIface, getServiceName, getServiceUUID, getTaskCounters, getTempStore, isOpen, isServiceReady, newHttpd, reattachDynamicCounters, registerIndex, registerIndex, registerIndex, reportCounters, sendEvent, serviceJoin, serviceLeave
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.bigdata.service.IBigdataFederation
getAnyDataService, getDataService, getDataServiceByName, getDataServiceUUIDs, getLastCommitTime, getLoadBalancerService, getMetadataService, getTransactionService, isDistributed, isStable
 
Methods inherited from interface com.bigdata.journal.IIndexStore
getResourceLockService
 

Field Detail

metadataIndexCachePolicy

protected final AbstractScaleOutClient.MetadataIndexCachePolicy metadataIndexCachePolicy
Constructor Detail

AbstractScaleOutFederation

public AbstractScaleOutFederation(IBigdataClient<T> client)
Parameters:
client -
Method Detail

getIndex

public ClientIndexView getIndex(String name,
                                long timestamp)
Strengthens the return type. Applies an AbstractIndexCache and strengthens the return type. Obtain a view on a partitioned index.

Specified by:
getIndex in interface IIndexStore
Specified by:
getIndex in interface IBigdataFederation<T>
Overrides:
getIndex in class AbstractFederation<T>
Parameters:
name - The index name.
timestamp - A transaction identifier, ITx.UNISOLATED for the unisolated index view, ITx.READ_COMMITTED, or timestamp for a historical view no later than the specified timestamp.
Returns:
The index or null if the index does not exist.

shutdown

public void shutdown()
Description copied from class: AbstractFederation
Normal shutdown allows any existing client requests to federation services to complete but does not schedule new requests, disconnects from the federation, and then terminates any background processing that is being performed on the behalf of the client (service discovery, etc).

Note: concrete implementations MUST extend this method.

Note: Clients use IBigdataClient.disconnect(boolean) to disconnect from a federation. The federation implements that disconnect using either AbstractFederation.shutdown() or AbstractFederation.shutdownNow().

The implementation must be a NOP if the federation is already shutdown.

Overrides:
shutdown in class AbstractFederation<T>

shutdownNow

public void shutdownNow()
Description copied from class: AbstractFederation
Immediate shutdown terminates any client requests to federation services, disconnects from the federation, and then terminate any background processing that is being performed on the behalf of the client (service discovery, etc).

Note: concrete implementations MUST extend this method to either disconnect from the remote federation or close the embedded federation and then clear the #fed reference so that the client is no longer "connected" to the federation.

Note: Clients use IBigdataClient.disconnect(boolean) to disconnect from a federation. The federation implements that disconnect using either AbstractFederation.shutdown() or AbstractFederation.shutdownNow().

The implementation must be a NOP if the federation is already shutdown.

Overrides:
shutdownNow in class AbstractFederation<T>

getMetadataIndex

public IMetadataIndex getMetadataIndex(String name,
                                       long timestamp)
Return a read-only view onto an IMetadataIndex.

Parameters:
name - The name of the scale-out index.
timestamp - The timestamp for the view.
Returns:
The IMetadataIndex for the named scale-out index -or- null iff there is no such scale-out index.
TODO:
The easiest way to have the view be correct is for the operations to all run against the remote metadata index (no caching).

There are three kinds of queries that we do against the metadata index: (1) get(key); (2) find(key); and (3) locatorScan(fromKey,toKey). The first is only used by the unit tests. The second is used when we start a locator scan, when we split a batch operation against the index partitions, and when we map an index procedure over a key range or use a key range iterator. This is the most costly of the queries, but it is also the one that is the least easy to cache. The locator scan itself is heavily buffered - a cache would only help for frequently scanned and relatively small key ranges. For this purpose, it may be better to cache the iterator result itself locally to the client (for historical reads or transactional reads).

The difficulty with caching find(key) is that we need to use the ILinearList API to locate the appropriate index partition. However, since it is a cache, there can be cache misses. These would show up as a "gap" in the (leftSeparator, rightSeparator) coverage.

If we do not cache access to the remote metadata index then we will impose additional latency on clients, traffic on the network, and demands on the metadata service. However, with high client concurrency mitigates the increase in access latency to the metadata index., Use a weak-ref cache with an LRU (or hard reference cache) to evict cached PartitionLocator. The client needs access by { indexName, timestamp, key }. We need to eventually evict the cached locators to prevent the client from building up too much state locally. Also the cached locators can not be shared across different timestamps, so clients will build up a locator cache when working on a transaction but then never go back to that cache once the transaction completes.

While it may be possible to share cached locators between historical reads and transactions for the same point in history, we do not have enough information on hand to make those decisions. What we would need to know is the historical commit time corresponding to an assigned transaction startTime. This is not one-to-one since the start times for transactions must be unique (among those in play). See ITransactionService.newTx(long) for more on this., cache leased information about index partitions of interest to the client. The cache will be a little tricky since we need to know when the client does not possess a partition definition. Index partitions are defined by the separator key - the first key that lies beyond that partition. the danger then is that a client will presume that any key before the first leased partition is part of that first partition. To guard against that the client needs to know both the separator key that represents the upper and lower bounds of each partition. If a lookup in the cache falls outside of any known partitions upper and lower bounds then it is a cache miss and we have to ask the metadata service for a lease on the partition. the cache itself is just a btree data structure with the proviso that some cache entries represent missing partition definitions (aka the lower bounds for known partitions where the left sibling partition is not known to the client).

With even a modest #of partitions, a locator scan against the MDS will be cheaper than attempting to fill multiple "gaps" in a local locator cache, so such a cache might be reserved for point tests. Such point tests are used by the sparse row store for its row local operations (vs scans) but are less common for JOINs., Just create cache view when MDI is large and then cache on demand., If the IMetadataIndex.get(byte[]) and IMetadataIndex.find(byte[]) methods are to be invoked remotely then we should return the byte[] rather than the de-serialized PartitionLocator so that we don't de-serialize them from the index only to serialize them for RMI and then de-serialize them again on the client., the easiest way to handle a scale-out metadata index is to make it hash-partitioned (vs range-partitioned). We can just flood queries to the hash partitioned index. For the iterator, we have to buffer the results and place them back into order. A fused view style iterator could be used to merge the iterator results from each partition into a single totally ordered iterator.


locatorScan

public Iterator<PartitionLocator> locatorScan(String name,
                                              long timestamp,
                                              byte[] fromKey,
                                              byte[] toKey,
                                              boolean reverseScan)
Returns an iterator that will visit the PartitionLocators for the specified scale-out index key range.

The method fetches a chunk of locators at a time from the metadata index. Unless the #of index partitions spanned is very large, this will be an atomic read of locators from the metadata index. When the #of index partitions spanned is very large, then this will allow a chunked approach.

Note: It is possible that a split, join or move could occur during the process of mapping the procedure across the index partitions. When the view is ITx.UNISOLATED or ITx.READ_COMMITTED this could make the set of mapped index partitions inconsistent in the sense that it might double count some parts of the key range or that it might skip some parts of the key range. In order to avoid this problem the caller MUST use read-consistent semantics. If the ClientIndexView is not already isolated by a transaction, then the caller MUST create a read-only transaction use the global last commit time of the federation.

Parameters:
name - The name of the scale-out index.
timestamp - The timestamp of the view. It is the responsibility of the caller to choose timestamp so as to provide read-consistent semantics for the locator scan.
fromKey - The scale-out index first key that will be visited (inclusive). When null there is no lower bound.
toKey - The first scale-out index key that will NOT be visited (exclusive). When null there is no upper bound.
reverseScan - true if you need to visit the index partitions in reverse key order (this is done when the partitioned iterator is scanning backwards).
Returns:
The iterator.

isScaleOut

public final boolean isScaleOut()
Return true.

See Also:
IndexMetadata

getIndexCache

protected IndexCache getIndexCache()
Description copied from class: AbstractFederation
Return the cache for IIndex objects.

Specified by:
getIndexCache in class AbstractFederation<T>

getMetadataIndexCache

protected MetadataIndexCache getMetadataIndexCache()
Return the cache for IMetadataIndex objects.


awaitServices

public UUID[] awaitServices(int minDataServices,
                            long timeout)
                     throws InterruptedException,
                            TimeoutException
Await the availability of an IMetadataService and the specified minimum #of IDataServices.

Parameters:
minDataServices - The minimum #of data services.
timeout - The timeout (ms).
Returns:
An array #of the UUIDs of the IDataServices that have been discovered by this client. Note that at least minDataServices elements will be present in this array but that ALL discovered data services MAY be reported.
Throws:
IllegalArgumentException - if minDataServices is non-positive.
IllegalArgumentException - if timeout is non-positive.
IllegalStateException - if the client is not connected to the federation.
InterruptedException - if this thread is interrupted while awaiting the availability of the MetadataService or the specified #of DataServices.
TimeoutException - If a timeout occurs.
TODO:
We should await critical services during connect() {MDS, TS, LS}. The LBS is not critical, but we should either have it on hand to notice our service join or we should notice its JOIN and then notice it ourselves. That would leave this method with the responsibility for awaiting the join of at least N data services (and perhaps verifying that the other services are still joined). FIXME This should be rewritten in the JiniFederation subclass to use the ServiceDiscoveryListener interface implemented by that class.

forceOverflow

public void forceOverflow(boolean truncateJournal)
Force overflow of each data service in the scale-out federation (only scale-out federations support overflow processing). This method is synchronous. It will not return until all DataServices have initiated and completed overflow processing. Any unused resources (as determined by the StoreManager) will have been purged.

Parameters:
truncateJournal - When true, the live journal will be truncated to its minimum extent (all writes will be preserved but there will be no free space left in the journal). This may be used to force the DataService to its minimum possible footprint.
TODO:
when overflow processing is enabled for the MetadataService we will have to modify this to also trigger overflow for those services.


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.