|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.rdf.store.DataLoader
public class DataLoader
A utility class to load RDF data into an AbstractTripleStore without
using Sesame API. This class does not parallelize the RDF parsing and writing
on the database. This class is not efficient for scale-out.
| Nested Class Summary | |
|---|---|
static class |
DataLoader.ClosureEnum
A type-safe enumeration of options effecting whether and when entailments are computed as documents are loaded into the database using the DataLoader. |
static class |
DataLoader.CommitEnum
A type-safe enumeration of options effecting whether and when the database will be committed. |
static interface |
DataLoader.Options
Options for the DataLoader. |
| Field Summary | |
|---|---|
protected static org.apache.log4j.Logger |
log
Logger. |
| Constructor Summary | |
|---|---|
DataLoader(AbstractTripleStore database)
Configure DataLoader using properties used to configure the
database. |
|
DataLoader(Properties properties,
AbstractTripleStore database)
Configure a data loader with overridden properties. |
|
| Method Summary | |
|---|---|
ClosureStats |
doClosure()
Compute closure as configured. |
void |
endSource()
Flush the StatementBuffer to the backing store. |
protected StatementBuffer<?> |
getAssertionBuffer()
Return the assertion buffer. |
DataLoader.ClosureEnum |
getClosureEnum()
How the DataLoader will maintain closure on the database. |
DataLoader.CommitEnum |
getCommitEnum()
Whether and when the DataLoader will invoke
ITripleStore.commit() |
AbstractTripleStore |
getDatabase()
The target database. |
boolean |
getFlush()
When true (the default) the StatementBuffer is
flushed by each loadData(String, String, RDFFormat) or
loadData(String[], String[], RDFFormat[]) operation and when
doClosure() is requested. |
InferenceEngine |
getInferenceEngine()
The object used to compute entailments for the database. |
LoadStats |
loadData(InputStream is,
String baseURL,
RDFFormat rdfFormat)
Load from an input stream. |
LoadStats |
loadData(Reader reader,
String baseURL,
RDFFormat rdfFormat)
Load from a reader. |
LoadStats |
loadData(String[] resource,
String[] baseURL,
RDFFormat[] rdfFormat)
Load a set of RDF resources into the database. |
LoadStats |
loadData(String resource,
String baseURL,
RDFFormat rdfFormat)
Load a resource into the database. |
LoadStats |
loadData(URL url,
String baseURL,
RDFFormat rdfFormat)
Load from a URL. |
protected void |
loadData2(LoadStats totals,
String resource,
String baseURL,
RDFFormat rdfFormat,
boolean endOfBatch)
Load an RDF resource into the database. |
void |
loadData3(LoadStats totals,
Object source,
String baseURL,
RDFFormat rdfFormat,
String defaultGraph,
boolean endOfBatch)
Loads data from the source. |
LoadStats |
loadFiles(File file,
String baseURI,
RDFFormat rdfFormat,
String defaultGraph,
FilenameFilter filter)
|
protected void |
loadFiles(LoadStats totals,
int depth,
File file,
String baseURI,
RDFFormat rdfFormat,
String defaultGraph,
FilenameFilter filter,
boolean endOfBatch)
|
static void |
main(String[] args)
Utility method may be used to create and/or load RDF data into a local database instance. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final transient org.apache.log4j.Logger log
| Constructor Detail |
|---|
public DataLoader(AbstractTripleStore database)
DataLoader using properties used to configure the
database.
database - The database.
public DataLoader(Properties properties,
AbstractTripleStore database)
properties - Configuration properties - see DataLoader.Options.database - The database.| Method Detail |
|---|
public AbstractTripleStore getDatabase()
public InferenceEngine getInferenceEngine()
protected StatementBuffer<?> getAssertionBuffer()
The assertion buffer is used to buffer statements that are being asserted so as to maximize the opportunity for batch writes. Truth maintenance (if enabled) will be performed no later than the commit of the transaction.
Note: The same buffer is reused by each loader so that we can on
the one hand minimize heap churn and on the other hand disable auto-flush
when loading a series of small documents. However, we obtain a new buffer
each time we perform incremental truth maintenance.
Note: When non-null and non-empty, the buffer MUST be
flushed (a) if a transaction completes (otherwise writes will not be
stored on the database); or (b) if there is a read against the database
during a transaction (otherwise reads will not see the unflushed
statements).
Note: if #truthMaintenance is enabled then this buffer is backed
by a temporary store which accumulates the SPOs to be asserted.
Otherwise it will write directly on the database each time it is flushed,
including when it overflows.
IStatementBufferFactory
where the appropriate factory is required for TM vs non-TM
scenarios (or where the factory is parameterize for tm vs non-TM).public boolean getFlush()
true (the default) the StatementBuffer is
flushed by each loadData(String, String, RDFFormat) or
loadData(String[], String[], RDFFormat[]) operation and when
doClosure() is requested. When false the caller
is responsible for flushing the buffer.
This behavior MAY be disabled if you want to chain load a bunch of small
documents without flushing to the backing store after each document and
loadData(String[], String[], RDFFormat[]) is not well-suited to
your purposes. This can be much more efficient, approximating the
throughput for large document loads. However, the caller MUST invoke
endSource() once all documents are loaded successfully. If an error
occurs during the processing of one or more documents then the entire
data load should be discarded.
DataLoader.Options.FLUSHpublic void endSource()
StatementBuffer to the backing store.
Note: If you disable auto-flush AND you are not using truth maintenance then you MUST explicitly invoke this method once you are done loading data sets in order to flush the last chunk of data to the store. In all other conditions you do NOT need to call this method. However it is always safe to invoke this method - if the buffer is empty the method will be a NOP.
public DataLoader.ClosureEnum getClosureEnum()
DataLoader will maintain closure on the database.
public DataLoader.CommitEnum getCommitEnum()
DataLoader will invoke
ITripleStore.commit()
public final LoadStats loadData(String resource,
String baseURL,
RDFFormat rdfFormat)
throws IOException
resource - baseURL - rdfFormat -
IOException
public final LoadStats loadData(String[] resource,
String[] baseURL,
RDFFormat[] rdfFormat)
throws IOException
resource - baseURL - rdfFormat -
IOException
public LoadStats loadData(Reader reader,
String baseURL,
RDFFormat rdfFormat)
throws IOException
reader - baseURL - rdfFormat -
IOException
public LoadStats loadData(InputStream is,
String baseURL,
RDFFormat rdfFormat)
throws IOException
is - baseURL - rdfFormat -
IOException
public LoadStats loadData(URL url,
String baseURL,
RDFFormat rdfFormat)
throws IOException
URL.
url - baseURL - rdfFormat -
IOException
protected void loadData2(LoadStats totals,
String resource,
String baseURL,
RDFFormat rdfFormat,
boolean endOfBatch)
throws IOException
resource - Either the name of a resource which can be resolved using the
CLASSPATH, or the name of a resource in the local file system,
or a URL.baseURL - rdfFormat - endOfBatch -
IOException - if the resource can not be resolved or loaded.
public LoadStats loadFiles(File file,
String baseURI,
RDFFormat rdfFormat,
String defaultGraph,
FilenameFilter filter)
throws IOException
file - The file or directory (required).baseURI - The baseURI (optional, when not specified the name of the each
file load is converted to a URL and used as the baseURI for
that file).rdfFormat - The format of the file (optional, when not specified the
format is deduced for each file in turn using the
RDFFormat static methods).defaultGraph - The value that will be used for the graph/context co-ordinate when
loading data represented in a triple format into a quad store.filter - A filter selecting the file names that will be loaded
(optional). When specified, the filter MUST accept directories
if directories are to be recursively processed.
IOException
protected void loadFiles(LoadStats totals,
int depth,
File file,
String baseURI,
RDFFormat rdfFormat,
String defaultGraph,
FilenameFilter filter,
boolean endOfBatch)
throws IOException
IOException
public void loadData3(LoadStats totals,
Object source,
String baseURL,
RDFFormat rdfFormat,
String defaultGraph,
boolean endOfBatch)
throws IOException
totals - Used to report out the total LoadStats.source - A Reader or InputStream.baseURL - The baseURI (optional, when not specified the name of the each
file load is converted to a URL and used as the baseURI for
that file).rdfFormat - The format of the file (optional, when not specified the
format is deduced for each file in turn using the
RDFFormat static methods).defaultGraph - The value that will be used for the graph/context co-ordinate
when loading data represented in a triple format into a quad
store.endOfBatch - Signal indicates the end of a batch.
IOExceptionpublic ClosureStats doClosure()
DataLoader.ClosureEnum.None was selected
then this MAY be used to (re-)compute the full closure of the database.
IllegalStateException - if assertion buffer is null#removeEntailments()
public static void main(String[] args)
throws IOException
args - [-quiet][-closure][-verbose][-namespace namespace] propertyFile (fileOrDir)*
where
IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||