com.bigdata.service.jini.master
Interface TaskMaster.ConfigurationOptions

All Known Subinterfaces:
MappedRDFDataLoadMaster.ConfigurationOptions, MappedTaskMaster.ConfigurationOptions, RDFDataLoadMaster.ConfigurationOptions, ThroughputMaster.ConfigurationOptions
Enclosing class:
TaskMaster<S extends TaskMaster.JobState,T extends Callable<U>,U>

public static interface TaskMaster.ConfigurationOptions

Configuration options for the TaskMaster and derived classes. The "component" for these options is the name of the concrete master class to be executed.

Version:
$Id$
Author:
Bryan Thompson

Field Summary
static String AGGREGATORS_TEMPLATE
          Deprecated. This is a trial feature which is not fully implemented.
static String CLIENTS_TEMPLATE
          A ServicesTemplate describing the types of services, and the minimum #of services of each type, to which the clients will be submitted for execution.
static String DELETE_JOB
          Boolean option may be used to delete the exiting job with the same name during startup (default false).
static String FORCE_OVERFLOW
          When true as an after action on the job, the DataServices in the federation will be made to undergo asynchronous overflow processing and the live journals will be truncated so that the total size on disk of the federation is at its minimum footprint for the given history retention policy (default false).
static String INDEX_DUMP_DIR
          The path to the directory in where DumpFederation.ScheduledDumpTasks will write metadata about the state, size, and other aspects of the index partitions throughout the run (optional).
static String INDEX_DUMP_NAMESPACE
          The namespace to be used for the DumpFederation.ScheduledDumpTasks (optional).
static String JOB_NAME
          The job name is used to identify the job within zookeeper.
static String NAGGREGATORS
          Deprecated. This is a trial feature which is not fully implemented.
static String NCLIENTS
          The #of clients to start.
static String SERVICES_DISCOVERY_TIMEOUT
          The timeout in milliseconds to await the discovery of the various services described by the SERVICES_TEMPLATES and CLIENTS_TEMPLATE.
static String SERVICES_TEMPLATES
          An array of zero or more ServicesTemplate describing the types of services, and the minimum #of services of each type, that must be discovered before the job may begin.
 

Field Detail

FORCE_OVERFLOW

static final String FORCE_OVERFLOW
When true as an after action on the job, the DataServices in the federation will be made to undergo asynchronous overflow processing and the live journals will be truncated so that the total size on disk of the federation is at its minimum footprint for the given history retention policy (default false). The master will block during this operation so you can readily tell when it is finished. Note that this option only makes sense in benchmark environments where you can contol the total system otherwise asynchronous writes may continue.

See Also:
AbstractScaleOutFederation.forceOverflow(boolean), Constant Field Values

INDEX_DUMP_DIR

static final String INDEX_DUMP_DIR
The path to the directory in where DumpFederation.ScheduledDumpTasks will write metadata about the state, size, and other aspects of the index partitions throughout the run (optional).

See Also:
INDEX_DUMP_NAMESPACE, Constant Field Values

INDEX_DUMP_NAMESPACE

static final String INDEX_DUMP_NAMESPACE
The namespace to be used for the DumpFederation.ScheduledDumpTasks (optional).

See Also:
INDEX_DUMP_DIR, Constant Field Values

DELETE_JOB

static final String DELETE_JOB
Boolean option may be used to delete the exiting job with the same name during startup (default false). This can be used if the last job terminated abnormally and you want to re-run the job.

See Also:
Constant Field Values

NCLIENTS

static final String NCLIENTS
The #of clients to start. The clients will be distributed across the discovered IRemoteExecutors in the federation matching the CLIENTS_TEMPLATE.

See Also:
Constant Field Values

CLIENTS_TEMPLATE

static final String CLIENTS_TEMPLATE
A ServicesTemplate describing the types of services, and the minimum #of services of each type, to which the clients will be submitted for execution.

These services MUST implement IRemoteExecutor since that is that API which will be used to submit the client tasks for execution. Normally, you will specify IClientService as the required interface. While it is also possible to run clients on an IDataService or even an IMetadataService, that is discouraged except when the tasks require local access to resources hosted by the service - for example, an administrative task requiring access to the index partitions locally on each IDataService.

See Also:
NCLIENTS, Constant Field Values

NAGGREGATORS

static final String NAGGREGATORS
Deprecated. This is a trial feature which is not fully implemented.
The #of aggregators to start (default is ZERO(0)). The aggregators will be distributed across the discovered IRemoteExecutors in the federation matching the AGGREGATORS_TEMPLATE.

See Also:
AGGREGATORS_TEMPLATE, Constant Field Values

AGGREGATORS_TEMPLATE

static final String AGGREGATORS_TEMPLATE
Deprecated. This is a trial feature which is not fully implemented.
A ServiceTemplate describing the types of services, and the minimum #of services, on which aggregation for asynchronous index writes will be performed (default is null, which means that aggregators will not be discovered).

The aggregator plays a role similar to the "reduce" of a map/reduce architecture. However, unlike map/reduce, an aggregator does not fully buffer the output set of the clients. Instead, each aggregator combines asynchronous index partition writes from multiple clients, splits those writes based on the current index partitions, and buffers chunks destined for each index partition until either the chunk size or the chunk timeout has been satisfied, at which point the chunk is written onto the corresponding index partition.

An aggregation step is necessary when there are a large #of index partitions for some index. Without an aggregator, each client will attempt to fill a chunk destined for each index partition. As the #of index partitions increases, clients can run at 100% CPU utilization trying to fill those chunks. When this occurs, the client is at the single machine limit.

By introducing an aggregation step, the client writes on a buffer which is drained by a thread writing onto the specified aggregator(s). This allows many more clients to run when compared with the #of services buffering chunks and performing the index writes. By decomposing the production and buffering stages we are able to get around the single machine limit.

Aggregators are essentially specialized clients and may execute in any IClientService container. They may be restricted to execute on only those services having specific attributes using this template.

See Also:
NAGGREGATORS, Constant Field Values
TODO:
#of aggregators per index., Each aggregator can be its own service so each index could be aggregated by a different aggregator on a different host.

Aggregator failure requires either restart of the job or re-processing of all source "documents" whose write set has not yet been made restart safe. In order to track that, we need to use a proxy for a KVOLatch for scale-out index for each document processed. When the write set for a scale-out index for that document is complete, the latch is triggered and the client is notified.


SERVICES_TEMPLATES

static final String SERVICES_TEMPLATES
An array of zero or more ServicesTemplate describing the types of services, and the minimum #of services of each type, that must be discovered before the job may begin.

See Also:
Constant Field Values

SERVICES_DISCOVERY_TIMEOUT

static final String SERVICES_DISCOVERY_TIMEOUT
The timeout in milliseconds to await the discovery of the various services described by the SERVICES_TEMPLATES and CLIENTS_TEMPLATE.

See Also:
Constant Field Values

JOB_NAME

static final String JOB_NAME
The job name is used to identify the job within zookeeper. A znode with this name will be created as follows:
 zroot (of the federation)
    / jobs
      / TaskMaster (fully qualified name of the concrete master class).
        / jobName
 
If the client will store state in zookeeper or use ZLocks, it must create a znode under the jobName whose name is the assigned client#. This znode may be used by the client to store its state in zookeeper. The client may also create ZLocks which are children of this znode.
          / client# (where # is the client#; the data of this znode is typically the client's state).
            / locknode (used to elect the client that is running if there is contention).
            / ...
 

See Also:
TaskMaster.JobState.getClientZPath(JiniFederation, int), Constant Field Values


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.