com.bigdata.rdf.load
Interface MappedRDFDataLoadMaster.ConfigurationOptions

All Superinterfaces:
MappedTaskMaster.ConfigurationOptions, TaskMaster.ConfigurationOptions
Enclosing class:
MappedRDFDataLoadMaster<S extends MappedRDFDataLoadMaster.JobState,T extends AbstractAsynchronousClientTask<U,V,L>,U,L extends ClientLocator,V extends Serializable>

public static interface MappedRDFDataLoadMaster.ConfigurationOptions
extends MappedTaskMaster.ConfigurationOptions

Configuration options for the MappedRDFDataLoadMaster.

Version:
$Id: MappedRDFDataLoadMaster.java 6045 2012-02-27 17:33:44Z thompsonbry $
Author:
Bryan Thompson

Field Summary
static String BNODES_INITIAL_CAPACITY
          The initial capacity of the hash map used to store RDF Values when processing a document (asynchronous writes only).
static String COMPUTE_CLOSURE
          When true, the closure of the data set will be computed.
static String CREATE
          When true, the master will create the ITripleStore identified by NAMESPACE if it does not exist.
static String DEFAULT_DEFAULT_GRAPH
          TODO Should we always enforce a real value? i.e.
static String DEFAULT_GRAPH
          The value that will be used for the graph/context co-ordinate when loading data represented in a triple format into a quad store.
static int DEFAULT_NOTIFY_POOL_SIZE
           
static int DEFAULT_OTHER_WRITER_POOL_SIZE
           
static int DEFAULT_PARSER_POOL_SIZE
           
static String DEFAULT_RDF_FORMAT
           
static long DEFAULT_REJECTED_EXECUTION_DELAY
          250Lms
static int DEFAULT_TERM2ID_WRITER_POOL_SIZE
           
static long DEFAULT_UNBUFFERED_STATEMENT_THRESHOLD
           
static String FORCE_OVERFLOW_BEFORE_CLOSURE
          When true, an overflow with a compacting merge will be requested for each data service before we compute the database at once closure.
static String LOAD_DATA
          When true, the data files will be loaded.
static String NAMESPACE
          The KB namespace.
static String NOTIFY_POOL_SIZE
          The #of threads used to handle asynchronous notification events when a resource has been successfully processed (document done and document error).
static String ONTOLOGY
          A file or directory whose data will be loaded into the KB when it is created.
static String ONTOLOGY_FILE_FILTER
          Only files matched by the optional FilenameFilter will be accepted for processing (optional, but must be Serializable if given).
static String OTHER_WRITER_POOL_SIZE
          The #of threads used to buffer asynchronous writes for the other indices.
static String PARSER_OPTIONS
          Optional job property may be used to set the options on the RDFParser.
static String PARSER_POOL_SIZE
          The core pool size for the thread pool running the parser tasks (default 5).
static String PARSER_QUEUE_CAPACITY
          The capacity of the work queue for the thread pool running the parser tasks (default is 2x the parser pool size).
static String PRODUCER_CHUNK_SIZE
          When terms and values are parsed from a document then are aggregated into chunks of this size before they are written onto the master for the asynchronous write API (10k to 20k should be fine).
static String RDF_FORMAT
          When the RDFFormat of a resource is not evident, assume that it is the format specified by this value (default ).
static String REJECTED_EXECUTION_DELAY
          The delay in milliseconds between resubmits of a task when the queue of tasks awaiting execution is at capacity.
static String TERM2ID_WRITER_POOL_SIZE
          The #of threads used to buffer asynchronous writes for the TERM2ID index.
static String UNBUFFERED_STATEMENT_THRESHOLD
          The maximum #of statements which can be parsed but not yet buffered on for asynchronous index writes before new parser tasks will be paused.
static String VALUES_INITIAL_CAPACITY
          The initial capacity of the hash map used to store RDF Values when processing a document (asynchronous writes only).
 
Fields inherited from interface com.bigdata.service.jini.master.MappedTaskMaster.ConfigurationOptions
CLIENT_HASH_FUNCTION, DEFAULT_PENDING_SET_MASTER_INITIAL_CAPACITY, DEFAULT_PENDING_SET_SUBTASK_INITIAL_CAPACITY, DELETE_AFTER, PENDING_SET_MASTER_INITIAL_CAPACITY, PENDING_SET_SUBTASK_INITIAL_CAPACITY, RESOURCE_BUFFER_CONFIG, RESOURCE_SCANNER_FACTORY
 
Fields inherited from interface com.bigdata.service.jini.master.TaskMaster.ConfigurationOptions
AGGREGATORS_TEMPLATE, CLIENTS_TEMPLATE, DELETE_JOB, FORCE_OVERFLOW, INDEX_DUMP_DIR, INDEX_DUMP_NAMESPACE, JOB_NAME, NAGGREGATORS, NCLIENTS, SERVICES_DISCOVERY_TIMEOUT, SERVICES_TEMPLATES
 

Field Detail

NAMESPACE

static final String NAMESPACE
The KB namespace.

See Also:
Constant Field Values

ONTOLOGY

static final String ONTOLOGY
A file or directory whose data will be loaded into the KB when it is created. If it is a directory, then all data in that directory will be loaded. Unlike the distributed bulk load, the file or directory MUST be readable by the master and the data in this file and/or directory are NOT deleted after they have been loaded.

Note: This is intended for the one-time load of ontologies pertaining to the data to be loaded. If you need to do additional non-bulk data loads you can always use the BigdataSail.

See Also:
Constant Field Values

ONTOLOGY_FILE_FILTER

static final String ONTOLOGY_FILE_FILTER
Only files matched by the optional FilenameFilter will be accepted for processing (optional, but must be Serializable if given). The default is an RDFFilenameFilter.

See Also:
RDFFilenameFilter, Constant Field Values

PARSER_POOL_SIZE

static final String PARSER_POOL_SIZE
The core pool size for the thread pool running the parser tasks (default 5).

See Also:
Constant Field Values

DEFAULT_PARSER_POOL_SIZE

static final int DEFAULT_PARSER_POOL_SIZE
See Also:
Constant Field Values

PARSER_QUEUE_CAPACITY

static final String PARSER_QUEUE_CAPACITY
The capacity of the work queue for the thread pool running the parser tasks (default is 2x the parser pool size).

See Also:
Constant Field Values

REJECTED_EXECUTION_DELAY

static final String REJECTED_EXECUTION_DELAY
The delay in milliseconds between resubmits of a task when the queue of tasks awaiting execution is at capacity.

See Also:
Constant Field Values

DEFAULT_REJECTED_EXECUTION_DELAY

static final long DEFAULT_REJECTED_EXECUTION_DELAY
250Lms

See Also:
Constant Field Values

TERM2ID_WRITER_POOL_SIZE

static final String TERM2ID_WRITER_POOL_SIZE
The #of threads used to buffer asynchronous writes for the TERM2ID index.

See Also:
Constant Field Values

DEFAULT_TERM2ID_WRITER_POOL_SIZE

static final int DEFAULT_TERM2ID_WRITER_POOL_SIZE
See Also:
Constant Field Values

OTHER_WRITER_POOL_SIZE

static final String OTHER_WRITER_POOL_SIZE
The #of threads used to buffer asynchronous writes for the other indices.

See Also:
Constant Field Values

DEFAULT_OTHER_WRITER_POOL_SIZE

static final int DEFAULT_OTHER_WRITER_POOL_SIZE
See Also:
Constant Field Values

NOTIFY_POOL_SIZE

static final String NOTIFY_POOL_SIZE
The #of threads used to handle asynchronous notification events when a resource has been successfully processed (document done and document error). These events are reported back to the job master using RMI. A thread pool is used to reduce latency for those asynchronous notifications.

See Also:
Constant Field Values

DEFAULT_NOTIFY_POOL_SIZE

static final int DEFAULT_NOTIFY_POOL_SIZE
See Also:
Constant Field Values

UNBUFFERED_STATEMENT_THRESHOLD

static final String UNBUFFERED_STATEMENT_THRESHOLD
The maximum #of statements which can be parsed but not yet buffered on for asynchronous index writes before new parser tasks will be paused. This is used to control the RAM demand of the parser tasks. The RAM demand of the buffered index writes in controlled by the capacity and chunk size for the asynchronous index write buffers.

See Also:
Constant Field Values

DEFAULT_UNBUFFERED_STATEMENT_THRESHOLD

static final long DEFAULT_UNBUFFERED_STATEMENT_THRESHOLD
See Also:
Constant Field Values

PRODUCER_CHUNK_SIZE

static final String PRODUCER_CHUNK_SIZE
When terms and values are parsed from a document then are aggregated into chunks of this size before they are written onto the master for the asynchronous write API (10k to 20k should be fine).

See Also:
Constant Field Values

VALUES_INITIAL_CAPACITY

static final String VALUES_INITIAL_CAPACITY
The initial capacity of the hash map used to store RDF Values when processing a document (asynchronous writes only).

See Also:
Constant Field Values

BNODES_INITIAL_CAPACITY

static final String BNODES_INITIAL_CAPACITY
The initial capacity of the hash map used to store RDF Values when processing a document (asynchronous writes only).

See Also:
Constant Field Values

CREATE

static final String CREATE
When true, the master will create the ITripleStore identified by NAMESPACE if it does not exist.

See Also:
Constant Field Values

LOAD_DATA

static final String LOAD_DATA
When true, the data files will be loaded. This can be disabled if you just want to compute the closure of the database.

See Also:
Constant Field Values

COMPUTE_CLOSURE

static final String COMPUTE_CLOSURE
When true, the closure of the data set will be computed. The writes are performed on the RDF database below the level of the BigdataSail so incremental truth maintenance WILL NOT be performed even if the sail was configured with that option.

See Also:
BigdataSail.Options.TRUTH_MAINTENANCE, Constant Field Values

FORCE_OVERFLOW_BEFORE_CLOSURE

static final String FORCE_OVERFLOW_BEFORE_CLOSURE
When true, an overflow with a compacting merge will be requested for each data service before we compute the database at once closure. This can save effort because we will need to scan large key-ranges in the database for some rules in order to compute the closure, and where the rule is embedded in a fixed point program, we will need to scan those key-ranges more than once. Also, the overflow operation is full distributed so it does not add all that much latency while the closure operation has less concurrency.

See Also:
Constant Field Values

PARSER_OPTIONS

static final String PARSER_OPTIONS
Optional job property may be used to set the options on the RDFParser.

See Also:
RDFParserOptions, Constant Field Values

RDF_FORMAT

static final String RDF_FORMAT
When the RDFFormat of a resource is not evident, assume that it is the format specified by this value (default ). The value is one of the String values of the known RDFFormats, including NQuadsParser.nquads. It may be null, in which case there is no default.

See Also:
Constant Field Values

DEFAULT_RDF_FORMAT

static final String DEFAULT_RDF_FORMAT

DEFAULT_GRAPH

static final String DEFAULT_GRAPH
The value that will be used for the graph/context co-ordinate when loading data represented in a triple format into a quad store.

See Also:
Constant Field Values

DEFAULT_DEFAULT_GRAPH

static final String DEFAULT_DEFAULT_GRAPH
TODO Should we always enforce a real value? i.e. provide a real default or abort the load.



Copyright © 2006-2011 SYSTAP, LLC. All Rights Reserved.