This is the first of a series of posts on new features that will be introduced with our 1.1 release. That release will feature support for much of SPARQL 1.1 and includes sophisticated new data structures and algorithms for fast, scalable analytic query. The highlights of our forthcoming 1.1 release are:
- SPARQL 1.1 support (Sesame 2.5) (except property paths, minus and update).
- A new query optimizer (the RTO).
- Scalable analytic operators (hash joins).
- New extensible hash tree index (the HTree).
- 100% native Java solution gets data off the JVM object heap (the Memory Manager).
This article will begin at the bottom of the stack, focusing on the memory manager and its role in supporting scalable query.
Due to byte code optimization, Java applications can run as fast as hand coded C application. However, there is a hidden cost associated with Java application — the maintenance of the JVM object heap. For many applications the cost of object creation, retention, and garbage collection may be negligible. However, as illustrated in the diagram below, an application with a high object creation and retention rate can get into trouble with the Garbage Collector.
As the application induced “heap pressure” increases, the garbage collector must run more and more frequently. Depending on the mode in which the garbage collector is running, it may either take cores away from the application or freeze out the application entirely during GC cycles. As the application heap pressure increases, the GC cycle time increases as well. Eventually, the garbage collector runs more than the application and application throughput plummets. Larger heaps can only mask this problem temporarily since larger heaps require more GC time.
With our 1.1 release, we are introducing the Bigdata MemoryManager class. We have been using the NIO package and native buffer allocations for some time in the clustered database, but with the introduction of the MemoryManager and the MemStrore, these native heap allocations are now easily reused for other purposes. The memory manager utilizes the Java NIO package to allocate large blocks of RAM on the native process heap using ByteBuffer.allocateDirect(…) and thus remains a 100% Java solution! These allocations are outside of the JVM managed memory and impose NO GC overhead. They are basically regions of the native C heap allocated by malloc inside of the JVM. Such allocations can not be deterministically released, so we maintain a pool of such native buffers and recycle buffers once they are no longer required for a specific purpose.
The memory manager is basically the RWStore technology repackaged for main memory. Like the RWStore, it can scale to terabytes. The key interface is com.bigdata.rwstore.sector.IMemoryManager. The IMemoryManager interface provides for hierarchical nesting of “allocation contexts” which share the same pool of backing buffers. This hierarchical allocation model makes it easier to ensure that allocations for the same purpose are grouped together on the same native buffers, and that all allocations are released no later than when their top allocation context goes out of scope. For example, an IRunningQuery on the QueryEngine has a top-level allocation context. Various operators in the query plan may create inner allocation contexts in which they store data, typically on an HTree instance. Since we always clear the top-level allocation context when the IRunningQuery is done, all such hash indices will be release no later than when the IRunningQuery is done.
The IMemoryManager is a low level interface which manages a logical address space over native buffers and provides methods to allocate, read, and delete slices of data on those buffers. The MemStore class in the same package provides a higher level IRawStore abstraction — the same abstraction which is used by the journal and the B+Tree and HTree classes. The MemStore makes it easy to use persistence capable data structures over native buffers. The main use of the MemStore in the 1.1 release is to support high level language constructs such as DISTINCT, scalable default graph evaluation, and highly efficient hash join operators. All of those features rely on the HTree operating over an MemStore. However, other applications are certainly possible, including very large application specific caches.
Bigdata is a high-performance RDF database that uses B+tree indices to index RDF statements and terms. But did you know that bigdata also uses these same B+tree indices to provide built-in support for Lucene-style free text search? If the text index is enabled (via a property when the database is created), then each literal added to the database (by appearing in the “O” position of a statement) is also added to the text index. This index can then be accessed directly, via the bigdata API, or indirectly, via high-level query. To accomplish this integration of free text search with high-level query, bigdata defines several magic predicates that are given special meaning, and when encountered in a SPARQL query are interpreted as service calls to the text index.
Before you get started, make sure you have enabled the free text index in your properties file:
The full list of magic predicates related to free text search is defined and documented in the class com.bigdata.rdf.store.BD. The simplest way to integrate free text search into a SPARQL query in bigdata is to use the magic predicate “bd:search” inside of a SPARQL join group. The predicate bd:search is used to search the full text index using the pattern in the “O” position of the search and to bind the hits (Literals) to the variable defined in the “S” position of the search. For example:
?lit bd:search “mike” .
will search the full text index for literals that contain the token “mike” and bind those literals onto the ?lit variable for use in subsequent joins. To find statements that use literals that contain the token “mike”, the SPARQL query would look as follows:
prefix bd: <http://www.bigdata.com/rdf#search>
select ?s, ?p, ?o
?o bd:search “mike” .
?s ?p ?o .
In addition to simple search, additional metadata about the search can be defined inside the SPARQL query using other magic predicates (also defined in the BD class). These predicates, when attached to the same variable as the search, will help narrow the search or bind additional metadata about search hits to other variables. We could expand the SPARQL query as follows:
prefix bd: <http://www.bigdata.com/rdf#search>
select ?s, ?p, ?o, ?score, ?rank
?o bd:search “mike personick” .
?o bd:matchAllTerms “true” .
?o bd:minRelevance “0.25” .
?o bd:relevance ?score .
?o bd:maxRank “1000” .
?o bd:rank ?rank .
?s ?p ?o .
The magic predicate bd:matchAllTerms indicates that only literals that contain all of the specified search terms should be considered. Similarly, literals can be constrained by min and max relevance (a 0 to 1 score signifying how closely the literal matches the search terms) and by min and max rank (hits are ordered by relevance and the rank describes where the literal appears in that ordered list). If the relevance or rank is relevant to the application, those pieces of metadata can be bound to variables in the search results using the predicates bd:relevance and bd:rank.
I’ve been working recently with a customer that is making extensive use of the free text search feature within its application. A number of interesting challenges arise when working with a large database inside an application designed to answer user queries with very low latency. The number of hits bound to a free text search can be very large (unconstrained search) or very small (highly constrained search), and when joined with the statement indices inside a SPARQL query, these hits can produce either huge numbers of relevant statements (a condition we’ve been referring to as “overflow”) or hardly any statements at all (“underflow”). By playing tricks with the SPARQL query we have been cutting the hits into “rank slices” by specifying the min and max ranks to be considered for any particular query, and then running those rank slices through the rest of the joins in the query until we find just enough results to paint the first page of search results, but no more. By starting with very small rank slices, we try to ensure we don’t overload and stall out the application while the user waits. This trick of rank slicing is made possible by caching the ordered list of free text hits temporarily inside the query engine, so that subsequent calls to the text index with the exact same parameters (search and metadata) will be costless.