com.bigdata.bop.rdf.update
Class ParseOp

java.lang.Object
  extended by com.bigdata.bop.CoreBaseBOp
      extended by com.bigdata.bop.BOpBase
          extended by com.bigdata.bop.PipelineOp
              extended by com.bigdata.bop.rdf.update.ParseOp
All Implemented Interfaces:
BOp, IPropertySet, Serializable, Cloneable

public class ParseOp
extends PipelineOp

Operator parses a RDF data source, writing bindings which represent statements onto the output sink. This operator is compatible with the ChunkedResolutionOp and the InsertStatementsOp.

Version:
$Id: ParseOp.java 6160 2012-03-18 19:57:37Z thompsonbry $ TODO Examine the integration point for Truth Maintenance (TM).

DataLoader.ClosureEnum and DataLoader.CommitEnum shape the way in which the update plan is generated. They are not options on the ParseOp itself.

We need to setup the assertion and retraction buffers such that they have the appropriate scope or (for database at once closure) we do not setup those buffers but we recompute the closure of the database afterwards.

The assertion buffers might be populated after the IV resolution step and before we write on the indices. We then compute the fixed point of the closure over the delta and then write that onto the database. We should be able to specify that some sources contain data to be removed (INSERT DATA and REMOVE DATA or UNLOAD src). The operation should combine assertions and retractions to be efficient.

See DataLoader. TODO Add an operator which handles a zip archive, creating a LOAD for each resource in that archive. Recursive directory processing is similar. Both should result in multiple ParseOp instances which can run in parallel. Those ParseOp instances will feed the IV resolution, optional TM, and statement writer operations.

If we can make the SOURCE_URI a value expression, then we could flow solutions into the LOAD operation which would be the bindings for the source URI. Very nice! Then we could hash partition the LOAD operator across a cluster and do a parallel load very easily. If the source for those solutions was the parse of a single RDF file (or streamed URI) containing the files to be loaded then we could also gain the indirection necessary to load large numbers of files in parallel on a cluster. TODO In at least the SIDS mode, we need to do some special operations when the statement buffer is flushed. That statement buffer could either be fed directly by the ParserOp or indirectly through solutions modeling statements flowing through the query engine. I am inclined to the latter for better parallelism. Even though there is more stuff on the heap and more latency within the stages, I think that we will get more out of the increased parallelism. TODO Any annotation here should be configurable from the LoadGraph AST node and (ideally) the SPARQL UPDATE syntax. FIXME This does not handle SIDS. The StatementBuffer logic needs to get into InsertStatementsOp for that to work, or the plan needs to be slightly different and hit a different insert operator for statements all together. FIXME This does not handle Truth Maintenance.

Author:
Bryan Thompson
See Also:
PresortRioLoader, StatementBuffer, DataLoader, DataLoader.Options, RDFParserOptions, DataLoader.ClosureEnum, DataLoader.CommitEnum, Serialized Form

Nested Class Summary
static interface ParseOp.Annotations
          Note: BOp.Annotations#TIMEOUT is respected to limit the read time on an HTTP connection.
 
Field Summary
protected static Var<?> c
          The s, p, o, and c variable names.
protected static Var<?> o
          The s, p, o, and c variable names.
protected static Var<?> p
          The s, p, o, and c variable names.
protected static Var<?> s
          The s, p, o, and c variable names.
 
Fields inherited from class com.bigdata.bop.CoreBaseBOp
DEFAULT_INITIAL_CAPACITY
 
Fields inherited from interface com.bigdata.bop.BOp
NOANNS, NOARGS
 
Constructor Summary
ParseOp(BOp[] args, Map<String,Object> annotations)
           
ParseOp(ParseOp op)
           
 
Method Summary
 FutureTask<Void> eval(BOpContext<IBindingSet> context)
          Return a FutureTask which computes the operator against the evaluation context.
 ParserStats newStats()
          Return a new object which can be used to collect statistics on the operator evaluation.
 
Methods inherited from class com.bigdata.bop.PipelineOp
assertAtOnceJavaHeapOp, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getMaxMemory, getMaxParallel, isAtOnceEvaluation, isBlockedEvaluation, isLastPassRequested, isPipelinedEvaluation, isSharedState
 
Methods inherited from class com.bigdata.bop.BOpBase
_clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArray
 
Methods inherited from class com.bigdata.bop.CoreBaseBOp
annotationsEqual, annotationsToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, toShortString, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

s

protected static final Var<?> s
The s, p, o, and c variable names.


p

protected static final Var<?> p
The s, p, o, and c variable names.


o

protected static final Var<?> o
The s, p, o, and c variable names.


c

protected static final Var<?> c
The s, p, o, and c variable names.

Constructor Detail

ParseOp

public ParseOp(BOp[] args,
               Map<String,Object> annotations)

ParseOp

public ParseOp(ParseOp op)
Method Detail

newStats

public ParserStats newStats()
Description copied from class: PipelineOp
Return a new object which can be used to collect statistics on the operator evaluation. This may be overridden to return a more specific class depending on the operator.

Overrides:
newStats in class PipelineOp

eval

public FutureTask<Void> eval(BOpContext<IBindingSet> context)
Description copied from class: PipelineOp
Return a FutureTask which computes the operator against the evaluation context. The caller is responsible for executing the FutureTask (this gives them the ability to hook the completion of the computation).

Specified by:
eval in class PipelineOp
Parameters:
context - The evaluation context.
Returns:
The FutureTask which will compute the operator's evaluation.


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.