|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.rdf.rio.AbstractStatementBuffer<F,G>
F - The generic type of the source Statement added to the
buffer by the callers.G - The generic type of the BigdataStatements stored in the
buffer.public abstract class AbstractStatementBuffer<F extends Statement,G extends BigdataStatement>
Class for efficiently converting Statements into
BigdataStatements, including resolving term identifiers (or adding
entries to the lexicon for unknown terms) as required. The class does not
write the converted BigdataStatements onto the database, but that
can be easily done using a resolving iterator pattern.
StatementBuffer does not appear to correctly canonicalize
terms when statement identifiers are enabled. Per below, this just
needs to be rewritten. The code could be simplified dramatically. If
the value is a BNode, then it goes into a map for canonicalizing blank
nodes with a life cycle of the document being loaded. If a statement
uses blank nodes then it must be deferred (this is true whether or not
statement identifiers are in use) so do NOT make the {s,p,o} canonical
since the statement and its terms will be processed later. Otherwise it
goes into a canonicalizing Set (add iff not found and return, otherwise
return the existing Value). The canonicalized value is used by the
statement. An incremental write will cause all terms in the Value[] to
be assigned term identifiers, so they should be BigdataValue objects.
The statements now have term identifiers and they are written onto the
DB. When the end of the document is reached, there will be deferred
statements iff there were blank nodes. Those are then processed per the
existing code. (If statement identifiers exist, then unify blank nodes
with statment identifiers otherwise just assign term identifiers to
blank nodes.) Note that the Value[] should be empty after each
incremental write. If there are deferred statements, then they already
have BigdataValue objects binding their term identifiers. When we
process the deferred statements we should only be assigning term
identifiers for blank nodes -- everything else should already have its
term identifier assigned for the deferred statements.| Nested Class Summary | |
|---|---|
static class |
AbstractStatementBuffer.StatementBuffer2<F extends Statement,G extends BigdataStatement>
Loads Statements into an RDF database. |
| Field Summary | |
|---|---|
protected static boolean |
DEBUG
|
protected static boolean |
INFO
|
protected static org.apache.log4j.Logger |
log
|
protected boolean |
readOnly
When true, Values will be resolved against
the LexiconRelation and Statements will be resolved
against the SPORelation, but unknown Values and
unknown Statements WILL NOT be inserted into the
corresponding relations. |
protected G[] |
statementBuffer
Buffer for accepted BigdataStatements. |
| Constructor Summary | |
|---|---|
AbstractStatementBuffer(AbstractTripleStore db,
boolean readOnly,
int capacity)
|
|
| Method Summary | |
|---|---|
void |
add(F e)
Imposes a canonical mapping on the subject, predicate, and objects of the given Statements and stores a new BigdataStatement
instance in the internal buffer. |
void |
add(Resource s,
URI p,
Value o)
Add an "explicit" statement to the buffer with a "null" context. |
void |
add(Resource s,
URI p,
Value o,
Resource c)
Add an "explicit" statement to the buffer. |
void |
add(Resource s,
URI p,
Value o,
Resource c,
StatementEnum type)
Add a statement to the buffer. |
protected void |
clear()
Clears the state associated with the BigdataStatements in
the internal buffer but does not discard the blank nodes or deferred
statements. |
protected BigdataValue |
convertValue(Value value)
Return a canonical BigdataValue instance representing the given
value. |
long |
flush()
Converts any buffered statements and any deferred statements and then invokes overflow() to flush anything remaining in the buffer. |
AbstractTripleStore |
getDatabase()
The database from the ctor. |
AbstractTripleStore |
getStatementStore()
Note: Returns the same value as getDatabase() since the
distinction is not captured by this class. |
BigdataValueFactory |
getValueFactory()
The ValueFactory for Statements and Values
created by this class. |
protected abstract int |
handleProcessedStatements(G[] a)
Invoked by overflow(). |
boolean |
isEmpty()
true if there are no buffered statements and no
buffered deferred statements |
protected void |
overflow()
Invoked each time the statementBuffer buffer would overflow. |
protected void |
processBufferedValues()
Efficiently resolves/adds term identifiers for the buffered BigdataValues. |
protected void |
processDeferredStatements()
Processes any BigdataStatements in the
deferredStatementBuffer, adding them to the
statementBuffer, which may cause the latter to
overflow(). |
void |
reset()
Discards all state (term map, bnodes, deferred statements, the buffered statements, and the counter whose value is reported by flush()). |
void |
setBNodeMap(Map<String,BigdataBNodeImpl> bnodes)
Set the canonicalizing map for blank nodes based on their ID. |
int |
size()
#of buffered statements plus the #of buffered statements that are being deferred. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final org.apache.log4j.Logger log
protected static final boolean INFO
protected static final boolean DEBUG
protected final boolean readOnly
true, Values will be resolved against
the LexiconRelation and Statements will be resolved
against the SPORelation, but unknown Values and
unknown Statements WILL NOT be inserted into the
corresponding relations.
protected final G extends BigdataStatement[] statementBuffer
BigdataStatements. This buffer is
cleared each time it would overflow.
| Constructor Detail |
|---|
public AbstractStatementBuffer(AbstractTripleStore db,
boolean readOnly,
int capacity)
db - The database against which the Values will be
resolved (or added). If this database supports statement
identifiers, then statement identifiers for the converted
statements will be resolved (or added) to the lexicon.readOnly - When true, Values (and statement
identifiers iff enabled) will be resolved against the
LexiconRelation, but entries WILL NOT be inserted
into the LexiconRelation for unknown Values
(or for statement identifiers for unknown
Statements when statement identifiers are
enabled).capacity - The capacity of the backing buffer.| Method Detail |
|---|
public AbstractTripleStore getDatabase()
getDatabase in interface IStatementBuffer<F extends Statement>public AbstractTripleStore getStatementStore()
getDatabase() since the
distinction is not captured by this class. This MUST be overriden in
derived classes which make this distinction.
getStatementStore in interface IStatementBuffer<F extends Statement>public BigdataValueFactory getValueFactory()
ValueFactory for Statements and Values
created by this class.
public void setBNodeMap(Map<String,BigdataBNodeImpl> bnodes)
IStatementBufferIStatementBuffer
instances. For example, the BigdataSail does this so that the
same bnode map is used throughout the life of a SailConnection.
While RIO provides blank node correlation within a given source, it does
NOT provide blank node correlation across sources. You need to use this
method to do that.
Note: It is reasonable to expect that the bnodes map is used by
concurrent threads. For this reason, the map SHOULD be thread-safe. This
can be accomplished either using Collections.synchronizedMap(Map)
or a ConcurrentHashMap. However, implementations MUST still be
synchronized on the map reference across operations which conditionally
insert into the map in order to make that update atomic and thread-safe.
Otherwise a race condition exists for the conditional insert and
different threads could get incoherent answers.
setBNodeMap in interface IStatementBuffer<F extends Statement>bnodes - The blank nodes map.protected BigdataValue convertValue(Value value)
BigdataValue instance representing the given
value. The scope of the canonical instance is until the next
internal buffer overflow (URIs and Literals) or until
flush() (BNodes, since blank nodes are global for a
given source). The purpose of the canonicalizing mapping is to reduce the
buffered BigdataValues to the minimum variety required to
represent the buffered BigdataStatements, which improves
throughput significantly (40%) when resolving terms to the corresponding
term identifiers using the LexiconRelation.
Note: This is not a true canonicalizing map when statement identifiers
are used since values used in deferred statements will be held over until
the buffer is flush()ed. This relaxation of the canonicalizing
mapping is not a problem since the purpose of the mapping is to provide
better throughput and nothign relies on a pure canonicalization of the
Values.
value - A value.
BigdataValue for the target
BigdataValueFactory. This will be null
iff the value is null (allows for the
context to be undefined).public boolean isEmpty()
true if there are no buffered statements and no
buffered deferred statements
isEmpty in interface IBuffer<F extends Statement>public int size()
size in interface IBuffer<F extends Statement>public void add(F e)
Statements and stores a new BigdataStatement
instance in the internal buffer. If the given statement is a
BigdataStatement then its StatementEnum will be used.
Otherwise the new statement will be StatementEnum.Explicit.
Note: Unlike the Values, a canonicalizing mapping is NOT imposed
for the statements. This is because, unlike the Values, there
tends to be little duplication in Statements when processing
RDF.
add in interface IStatementBuffer<F extends Statement>add in interface IBuffer<F extends Statement>e - The statement. If stmt implements
BigdataStatement then the StatementEnum will
be used (this makes it possible to load axioms into the
database as axioms) but the term identifiers on the stmt's
values will be ignored.
public void add(Resource s,
URI p,
Value o)
IStatementBuffer
add in interface IStatementBuffer<F extends Statement>s - The subject.p - The predicate.o - The object.
public void add(Resource s,
URI p,
Value o,
Resource c)
IStatementBuffer
add in interface IStatementBuffer<F extends Statement>s - The subject.p - The predicate.o - The object.c - The context (optional).
public void add(Resource s,
URI p,
Value o,
Resource c,
StatementEnum type)
IStatementBufferNote: The context parameter (c) is NOT used. The database at this time is either a triple store or a triple store with statement identifiers, and in neither case is the context used.
add in interface IStatementBuffer<F extends Statement>s - The subject.p - The predicate.o - The object.c - The context (optional).type - The statement type (optional).protected void processBufferedValues()
BigdataValues.
If readOnly), then the term identifier for unknown values
will remain IRawTripleStore.NULL.
protected void processDeferredStatements()
BigdataStatements in the
deferredStatementBuffer, adding them to the
statementBuffer, which may cause the latter to
overflow().
protected final void overflow()
statementBuffer buffer would overflow.
This method is responsible for bulk resolving / adding the buffered
BigdataValues against the db and adding the fully
resolved BigdataStatements to the queue on which the
#iterator() is reading.
protected abstract int handleProcessedStatements(G[] a)
overflow().
a - An array of processed BigdataStatements.
counter reported by
flush().public long flush()
overflow() to flush anything remaining in the buffer.
flush in interface IBuffer<F extends Statement>public void reset()
flush()).
reset in interface IBuffer<F extends Statement>protected void clear()
BigdataStatements in
the internal buffer but does not discard the blank nodes or deferred
statements.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||