com.bigdata.rdf.rules
Enum RuleContextEnum

java.lang.Object
  extended by java.lang.Enum<RuleContextEnum>
      extended by com.bigdata.rdf.rules.RuleContextEnum
All Implemented Interfaces:
Serializable, Comparable<RuleContextEnum>

public enum RuleContextEnum
extends Enum<RuleContextEnum>

Type-safe enumeration capturing the primary uses cases for rule execution.

The uses cases here reduce to two basic variants: (a) Query using a read-consistent view; and (b) Rules that write on a database. The latter has two twists: for TruthMaintenance the rules write on a TempTripleStore while for DatabaseAtOnceClosure they write directly on the knowledge base.

Note: The scale-out architecture imposes a concurrency control layer such that conflicts for access to the unisolated indices can not arise and therefore is not relevant to the rest of this discussion.

For the use cases that write on a database without the concurrency control laer (regardless of whether it is the focusStore or the main knowledge base) there is a concurrency control issue that can be resolved in one or two different ways. The basic issue is that rule execution populates IBuffers that are automatically flushed when they become full (or when a sequential step in an IProgram is complete). If there are iterator(s) reading concurrently on the same view of the index on which the buffer(s) write, then this violates the contract for the BTree which is safe for concurrent readers -or- a single writer. The parallel execution of more than one rule makes this a problem even with the iterators are fully buffered (vs the newer asynchronous iterators which have the same problem even when only one rule is running.)

Note: For TruthMaintenance we actually read from two different sources: a focusStore and the knowledge base. In this situation we are free to read on the knowledge base using an unisolated view because truth maintenance requires exclusive write access and therefore no other process will be writing on the knowledge base.

We can do two things to avoid violating the BTree concurrency contract:

  1. Read using a read-committed view (for the source on which the rules will write) and write on the unisolated view. The main drawback with this approach is that we must checkpoint (for a TemporaryStore) or commit (for a Journal) after each sequential step of an IProgram (including after each round of closure as a special case). This slows down inference and, for TruthMaintenance, can cause the TemporaryStore to be flushed to disk when otherwise it might be fully buffered and never touch the disk.
  2. Read and write on the unisolated BTree and use a mutex lock coordinate access to that index. The mutex lock must serialize (concurrent) readers and the (single) writer. The writer gains the lock when it needs to flush a buffer, at which point any reader(s) on the unisolated BTrees block and grant access to the writer and then resume their operations when the writer releases the lock.

    For a single rule, only an asynchronous iterator can conflict write the task flushing the buffer. However, when more than one rule is being executed concurrently, it is possible for conflicts to arise even with fully buffered iterators.

    The advantage of this approach is that we can use only the unisolated indices (better buffer management) and we do not need to either checkpoint (for a TempTripleStore) or commit (for a LocalTripleStore). For TempTripleStore this can mean that we never even touch the disk while for a LocalTripleStore is means that we only commit when the closure operation is complete.

TODO:
we have to jump through hoops whenever we are doing TruthMaintenance with a focusStore backed by a TemporaryStore (which is the only way we can do it today).

For database at once closure, we only need to jump through hoops when the database is on a Journal. If it is on an IBigdataFederation then the concurrency control layer ensures that none of the problems can arise., we need to recognize the use case and then recognize which relations (and their indices) belong to the focusStore and the knowledge base so that we can choose the appropriate view for each., flushing the IBuffer for mutation operations needs to coordinate with both the fully buffered and the asynchronous iterators. this is only for TruthMaintenance or when the knowledge base is on a Journal. there must be one mutex per named index on which we will write (actually, that can be simplified to one mutex per relation on which we will write since the relations always update all of their indices)., Use the readTimestamp for query (so we can query for a historical commit time) but ignore it for DatabaseAtOnceClosure and TruthMaintenance (presuming that we are operating on the current state of the kb)?


Enum Constant Summary
DatabaseAtOnceClosure
           Database at once closure is the most efficient way to compute the closure over the model theory for the KB.
HighLevelQuery
           High-level queries (SPARQL) can in general be translated into a rule that is directly executed by the bigdata rule execution layer.
TruthMaintenance
           Truth maintenance must be used when you incrementally assert or retract a set of explicit (or told) statements (or assertions or triples).
 
Method Summary
static RuleContextEnum valueOf(String name)
          Returns the enum constant of this type with the specified name.
static RuleContextEnum[] values()
          Returns an array containing the constants of this enum type, in the order they are declared.
 
Methods inherited from class java.lang.Enum
clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Enum Constant Detail

DatabaseAtOnceClosure

public static final RuleContextEnum DatabaseAtOnceClosure

Database at once closure is the most efficient way to compute the closure over the model theory for the KB. In general, database-at-once closure is requested when you bulk load a large amount of data into a knowledge base. You request database-at-once closure using InferenceEngine.computeClosure(AbstractTripleStore) WITHOUT the optional focusStore.

As long as justifications are enabled, you can incrementally assert or retract statements using TruthMaintenance. If justifications are NOT enabled, then you can re-compute the closure of the database after adding assertions. If you have retracted assertions, then you first need to delete all inferences from the knowledge base and then recompute the closure of the database.

Database-at-once closure reads and writes on the persistent knowledge base and does not utilize a TempTripleStore.


TruthMaintenance

public static final RuleContextEnum TruthMaintenance

Truth maintenance must be used when you incrementally assert or retract a set of explicit (or told) statements (or assertions or triples). Each time new assertions are made or retracted the closure of the knowledge base must be updated, causing entailments (or inferred statements) to be either asserted or retracted. This is handled by TruthMaintenance and InferenceEngine.

Adding assertions is relatively straight forward since all the existing entailments will remain valid, but new entailments might be computable based on the new assertions. The only real twist is that we record justifications (aka proof chains) to support truth maintenance when statements are retracted.

Retractions require additional effort since entailments already in the knowledge base MIGHT NOT be supported once some explicit statements are retracted. Attempting to directly retract an inference or an axiom has no effect since they are entailments by some combination of the model theory and the explicit statements. However, when an explicit statement in the knowledge base is retracted a search must be performed to identify whether or not the statement is still provable based on the remaining statements. In the current implementation we chase justification in order to decide whether or not the explicit statement will be converted to an inference (or an axiom) or retracted from the knowledge base. This process is recursive since a statement that is gets retracted (rather than being converted to an inference) can cause other entailments to no longer be supported.

When asserting or retracting statements using truth maintenance, the statements are first loaded into a TempTripleStore known as the focusStore. Next we compute the closure of the focusStore against the assertions already in the knowledge base. This is done using TMUtility to rewrite the IProgram into a new (and larger) set of rules. For each original IRule, we derive N new rules, where N is the number of tail IPredicate in the rule. These derived rules reads from either the focusStore or the fused view of the focusStore and the knowledge base and they write on the focusStore. Once the closure of the focusStore against the knowledge base has been computed, all statements in that closure are either asserted against or retracted from the knowledge base (depending on whether the original set of statements was being asserted or retracted). That final step is done using either a bulk statement copy or a bulk statement remove operation.

Since the state of the knowledge base does not change while we are computing the closure of the focusStore against the knowledge base we can use a read-consistent view of the knowledge base throughout the operation. At the same time, we are both reading from and writing on the focusStore.


HighLevelQuery

public static final RuleContextEnum HighLevelQuery

High-level queries (SPARQL) can in general be translated into a rule that is directly executed by the bigdata rule execution layer. This provides extremely efficient query answering. The same approach can be used with custom rule evaluation - there is no difference once it gets down to the execution of the rule(s).

The generated rule SHOULD be executed against a read-consistent view of the knowledge base (NOT read-committed since that can result in dirty reads). In a scenario where the knowledge base is unchanging, this is very efficient as it allows full concurrency with less (no) overhead for concurrency control. In addition, concurrent writes on the knowledge base are allowed.

New readers SHOULD use a read-consistent timestamp that reflects the desired (generally, most recent) commit point corresponding to a closure of the knowledge base.

Method Detail

values

public static RuleContextEnum[] values()
Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
for (RuleContextEnum c : RuleContextEnum.values())
    System.out.println(c);

Returns:
an array containing the constants of this enum type, in the order they are declared

valueOf

public static RuleContextEnum valueOf(String name)
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)

Parameters:
name - the name of the enum constant to be returned.
Returns:
the enum constant with the specified name
Throws:
IllegalArgumentException - if this enum type has no constant with the specified name
NullPointerException - if the argument is null


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.