<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>bigdata®</title>
	<atom:link href="http://www.bigdata.com/bigdata/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.bigdata.com/bigdata/blog</link>
	<description>bigdata® is a scale-out storage and computing fabric supporting optional transactions, very high concurrency, and very high aggregate IO rates.</description>
	<lastBuildDate>Mon, 16 Apr 2012 10:00:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Client-Server API</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=441</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=441#comments</comments>
		<pubDate>Mon, 16 Apr 2012 10:00:03 +0000</pubDate>
		<dc:creator>Mike Personick</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=441</guid>
		<description><![CDATA[Did you know that bigdata has a built-in REST API for client-server access to the RDF database? We call this interface the &#8220;NanoSparqlServer&#8221;, and it&#8217;s API is outlined in detail on the wiki: https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=NanoSparqlServer What&#8217;s new with the NSS is that we&#8217;ve recently added a Java API around it so that you can write client <a href='http://www.bigdata.com/bigdata/blog/?p=441'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>Did you know that bigdata has a built-in REST API for client-server access to the RDF database?  We call this interface the &#8220;NanoSparqlServer&#8221;, and it&#8217;s API is outlined in detail on the wiki:</p>
<p><a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=NanoSparqlServer">https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=NanoSparqlServer</a></p>
<p>What&#8217;s new with the NSS is that we&#8217;ve recently added a Java API around it so that you can write client code without having to understand the HTTP API or make HTTP calls directly.  This is why there is suddenly a new dependency on Apache&#8217;s HTTP Components in the codebase.  The Java wrapper is called &#8220;RemoteRepository&#8221;.  If you&#8217;re comfortable writing application code against the Sesame SAIL/Repository API you should feel pretty at home with the RemoteRepository class.  Not exactly the same, but very very similar.</p>
<p>The class itself is pretty self-explanatory but if you like examples, there is a test case for every API call in RemoteRepository in the class TestNanoSparqlClient.  (That test case also conveniently demonstrates how to launch a NanoSparqlServer wrapping a bigdata journal using Jetty, which it does at the beginning of every test.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=441</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Custom SPARQL Functions</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=435</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=435#comments</comments>
		<pubDate>Mon, 16 Apr 2012 09:36:51 +0000</pubDate>
		<dc:creator>Mike Personick</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=435</guid>
		<description><![CDATA[I put together a more useful example of how to write a custom SPARQL function with bigdata. It&#8217;s up on the wiki here: https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=CustomFunction The example details a common use case &#8211; filtering out solutions based on security credentials for a particular user. For example, if you wanted to return a list of document visible <a href='http://www.bigdata.com/bigdata/blog/?p=435'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>I put together a more useful example of how to write a custom SPARQL function with bigdata.  It&#8217;s up on the wiki here:</p>
<p><a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=CustomFunction">https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=CustomFunction</a></p>
<p>The example details a common use case &#8211; filtering out solutions based on security credentials for a particular user.  For example, if you wanted to return a list of document visible to the user &#8220;John&#8221;, you could do it with a custom SPARQL function:</p>
<pre>
PREFIX ex: &lt;http://www.example.com/&gt;
SELECT ?doc
{
  ?doc rdf:type ex:Document .
  filter(ex:validate(?doc, ?user)) .
}
BINDINGS ?user {
  (ex:John)
}
</pre>
<p>The function is called by referencing its unique URI, in this case ex:validate.  This URI must be registered with bigdata&#8217;s FunctionRegistry along with an appropriate factory and operator.  The wiki details how to do that.  In the query above, the function is called with two arguments, the document to be validated and the user to validate against.  The user in this simple example is a constant included in the BINDINGS clause.  Always remember that bigdata custom functions are executed one solution at a time &#8211; they do not yet benefit from vectored execution and thus are not suitable for reading data from the indices.  (The functions must operate without reading from the index on a per execution call basis.)  A custom service (distinct from a custom function) is a more appropriate choice when execution requires touching indices.  This is how we implement SPARQL 1.1 Federation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=435</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Graph Data Management 2012</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=429</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=429#comments</comments>
		<pubDate>Tue, 10 Apr 2012 19:11:01 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=429</guid>
		<description><![CDATA[The Graph Data Management 2012 workshop was last week in Washington, DC. The workshop brought together an interesting mixture of people from several different background. There were people people focused on data mining and prediction, people focused on graph algorithms (iterative algorithms over materialized graphs), and several presentations on &#8220;graph databases&#8221; (3 on RDF databases <a href='http://www.bigdata.com/bigdata/blog/?p=429'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.cse.unsw.edu.au/~iwgdm/2012/accpapers.html">Graph Data Management</a> 2012 workshop was last week in Washington, DC.  The workshop brought together an interesting mixture of people from several different background.  There were people people focused on data mining and prediction, people focused on graph algorithms (iterative algorithms over materialized graphs), and several presentations on &#8220;graph databases&#8221; (3 on RDF databases and one on HyperGraphDB).  Many thanks to the workshop organizers for pulling together such an interesting event!</p>
<p>It is clear that the &#8220;graph database&#8221; space is currently handicapped by a lack of standards.  SPARQL can certainly solve many of the problems there, but it lacks a standardized way for dealing with provenance (aka link attributes).  We have efficient extensions for this and it sounds like at least Virtuoso will be picking them up as well, so maybe we can drive standardization that way.  SPARQL has support for property paths, but it lacks a means to express iterative refinement algorithms so they could be executed efficiently within the database.  It is possible to use SPARQL update commands to operate iteratively on data sets on the server without round-tripping large graphs to the client, but it is not yet possible to specify control logic for such updates in a standardized manner, and without extensions which clarify which graphs or solutions should be durable and which should be wired into main memory it is difficult to use SPARQL update for iterative algorithms which assemble an annotated graph.  Equally worrisome, it appears that it is not yet possible to create good benchmarks for graph databases right now because the low level APIs wipe out the tremendous advantage which you gain from vectored evaluation in a database.</p>
<p>We will be announcing some new features over the next few weeks and the coming months designed to address some of these issues.  The first feature will extend SPARQL 1.1 UPDATE to let you provision and manage solutions sets.  A preview of this <a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=SPARQL_Update">SPARQL UPDATE extension</a> is published on the bigdata wiki.  The extension adds just a little bit of syntax, but a whole lot of power.  It was originally envisioned to give people the ability to page through large result sets without re-evaluating complex joins &#8211; a use case which is illustrated on the wiki.  However, we see lots opportunities beyond an application aware SPARQL cache.</p>
<p>Another feature which will come out later this year is a distributed client/server graph protocol.  This is designed to address the tight coupling of applications with graph databases, provide a fast, scalable object level cache for graph data, and provide both fast in-memory traversal on the client and efficient subgraph matching on the server.  Clients will also be able to create &#8220;graph transactions&#8221; and post updates back to the server and write through cache fabric.  We plan to have multiple client language bindings for this, providing graph database access within the browser, in Java, etc.  We are even looking at a GPU binding for pure computational speed.  The language bindings will be generated based on metadata describing the object models.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=429</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bigdata 1.2.0 release (SPARQL UPDATE, Federated Query, Service Description and more)</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=423</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=423#comments</comments>
		<pubDate>Sun, 01 Apr 2012 13:44:49 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=423</guid>
		<description><![CDATA[This is a major version release of bigdata(R). Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster. Bigdata operates in both a single machine mode (Journal) and a cluster mode (Federation). The Journal provides fast scalable <a href='http://www.bigdata.com/bigdata/blog/?p=423'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>This is a major version release of bigdata(R).  Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster.  Bigdata operates in both a single machine mode (Journal) and a cluster mode (Federation).  The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion triples / quads.  The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth.  Both platforms support fully concurrent readers with snapshot isolation.</p>
<p>Distributed processing offers greater throughput but does not reduce query or update latency.  Choose the Journal when the anticipated scale and throughput requirements permit.  Choose the Federation when the administrative and machine overhead associated with operating a cluster is an acceptable tradeoff to have essentially unlimited data scaling and throughput.</p>
<p>See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and [3,5,6] for news, questions, and the latest developments. For more information about SYSTAP, LLC and bigdata, see [7].</p>
<p>Starting with the 1.0.0 release, we offer a WAR artifact [8] for easy installation of the single machine RDF database.  For custom development and cluster installations we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse.  You can also build the code using the ant script.  The cluster installer requires the use of the ant script.</p>
<p>You can download the WAR from:</p>
<p>http://sourceforge.net/projects/bigdata/</p>
<p>You can checkout this release from:</p>
<p>https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/BIGDATA_RELEASE_1_2_0</p>
<p>New features:</p>
<p>- SPARQL 1.1 UPDATE<br />
- SPARQL 1.1 Service Description<br />
- SPARQL 1.1 Basic Federated Query<br />
- New integration point for custom services (ServiceRegistry).<br />
- Remote Java client for NanoSparqlServer<br />
- Sesame 2.6.3<br />
- Ganglia integration (cluster)<br />
- Performance improvements (cluster) </p>
<p>Feature summary:</p>
<p>- Single machine data storage to ~50B triples/quads (RWStore);<br />
- Clustered data storage is essentially unlimited;<br />
- Simple embedded and/or webapp deployment (NanoSparqlServer);<br />
- Triples, quads, or triples with provenance (SIDs);<br />
- Fast RDFS+ inference and truth maintenance;<br />
- Fast 100% native SPARQL 1.1 evaluation;<br />
- Integrated &#8220;analytic&#8221; query package;<br />
- %100 Java memory manager leverages the JVM native heap (no GC);</p>
<p>Road map [3]:</p>
<p>- SPARQL 1.1 property paths (last missing feature for SPARQL 1.1);<br />
- Runtime Query Optimizer for Analytic Query mode;<br />
- Simplified deployment, configuration, and administration for clusters; and<br />
- High availability for the journal and the cluster.</p>
<p>Change log:</p>
<p>  Note: Versions with (*) MAY require data migration. For details, see [9].</p>
<p>1.2.0: (*)</p>
<p>- http://sourceforge.net/apps/trac/bigdata/ticket/92  (Monitoring webapp)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/267 (Support evaluation of 3rd party operators)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/337 (Compact and efficient movement of binding sets between nodes.)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/433 (Cluster leaks threads under read-only index operations: DGC thread leak)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/437 (Thread-local cache combined with unbounded thread pools causes effective memory leak: termCache memory leak &#038; thread-local buffers)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/438 (KeyBeforePartitionException on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/439 (Class loader problem)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/441 (Ganglia integration)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/443 (Logger for RWStore transaction service and recycler)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/444 (SPARQL query can fail to notice when IRunningQuery.isDone() on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/445 (RWStore does not track tx release correctly)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/446 (HTTP Repostory broken with bigdata 1.1.0)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/448 (SPARQL 1.1 UPDATE)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/449 (SPARQL 1.1 Federation extension)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/451 (Serialization error in SIDs mode on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/454 (Global Row Store Read on Cluster uses Tx)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/456 (IExtension implementations do point lookups on lexicon)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/457 (&#8220;No such index&#8221; on cluster under concurrent query workload)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/458 (Java level deadlock in DS)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/460 (Uncaught interrupt resolving RDF terms)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/461 (KeyAfterPartitionException / KeyBeforePartitionException on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/463 (NoSuchVocabularyItem with LUBMVocabulary for DerivedNumericsExtension)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/464 (Query statistics do not update correctly on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/465 (Too many GRS reads on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/469 (Sail does not flush assertion buffers before query)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/472 (acceptTaskService pool size on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/475 (Optimize serialization for query messages on cluster)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/476 (Test suite for writeCheckpoint() and recycling for BTree/HTree)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/478 (Cluster does not map input solution(s) across shards)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/480 (Error releasing deferred frees using 1.0.6 against a 1.0.4 journal)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/481 (PhysicalAddressResolutionException against 1.0.6)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/482 (RWStore reset() should be thread-safe for concurrent readers)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/484 (Java API for NanoSparqlServer REST API)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/491 (AbstractTripleStore.destroy() does not clear the locator cache)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/492 (Empty chunk in ThickChunkMessage (cluster))<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/493 (Virtual Graphs)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/496 (Sesame 2.6.3)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/497 (Implement STRBEFORE, STRAFTER, and REPLACE)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/498 (Bring bigdata RDF/XML parser up to openrdf 2.6.3.)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/500 (SPARQL 1.1 Service Description)<br />
- http://www.openrdf.org/issues/browse/SES-884        (Aggregation with an solution set as input should produce an empty solution as output)<br />
- http://www.openrdf.org/issues/browse/SES-862        (Incorrect error handling for SPARQL aggregation; fix in 2.6.1)<br />
- http://www.openrdf.org/issues/browse/SES-873        (Order the same Blank Nodes together in ORDER BY)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/501 (SPARQL 1.1 BINDINGS are ignored)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/503 (Bigdata2Sesame2BindingSetIterator throws QueryEvaluationException were it should throw NoSuchElementException)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/504 (UNION with Empty Group Pattern)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/505 (Exception when using SPARQL sort &#038; statement identifiers)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/506 (Load, closure and query performance in 1.1.x versus 1.0.x)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/508 (LIMIT causes hash join utility to log errors)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/513 (Expose the LexiconConfiguration to Function BOPs)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/515 (Query with two &#8220;FILTER NOT EXISTS&#8221; expressions returns no results)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/516 (REGEXBOp should cache the Pattern when it is a constant)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/517 (Java 7 Compiler Compatibility)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/518 (Review function bop subclass hierarchy, optimize datatype bop, etc.)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/520 (CONSTRUCT WHERE shortcut)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/521 (Incremental materialization of Tuple and Graph query results)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/525 (Modify the IChangeLog interface to support multiple agents)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/527 (Expose timestamp of LexiconRelation to function bops)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/532 (ClassCastException during hash join (can not be cast to TermId))<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/533 (Review materialization for inline IVs)<br />
- http://sourceforge.net/apps/trac/bigdata/ticket/534 (BSBM BI Q5 error using MERGE JOIN)</p>
<p>1.1.0 (*)</p>
<p> &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/23  (Lexicon joins)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/109 (Store large literals as &#8220;blobs&#8221;)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM &#8220;how to&#8221; in wiki and build.xml are out of date.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/203 (Implement an persistence capable hash table to support analytic query)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/209 (AccessPath should visit binding sets rather than elements for high level query.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/227 (SliceOp appears to be necessary when operator plan should suffice without)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/232 (Bottom-up evaluation semantics).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/246 (Derived xsd numeric data types must be inlined as extension types.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/254 (Revisit pruning of intermediate variable bindings during query execution)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/261 (Lift conditions out of subqueries.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/300 (Native ORDER BY)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/324 (Inline predeclared URIs and namespaces in 2-3 bytes)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/330 (NanoSparqlServer does not locate &#8220;html&#8221; resources when run from jar)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/334 (Support inlining of unicode data in the statement indices.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/364 (Scalable default graph evaluation)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/368 (Prune variable bindings during query evaluation)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/370 (Direct translation of openrdf AST to bigdata AST)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/373 (Fix StrBOp and other IValueExpressions)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/377 (Optimize OPTIONALs with multiple statement patterns.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/380 (Native SPARQL evaluation on cluster)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/387 (Cluster does not compute closure)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/395 (HTree hash join performance)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/401 (inline xsd:unsigned datatypes)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/408 (xsd:string cast fails for non-numeric data)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/421 (New query hints model.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/431 (Use of read-only tx per query defeats cache on cluster)</p>
<p>1.0.3</p>
<p> &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/217 (BTreeCounters does not track bytes released)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/269 (Refactor performance counters using accessor interface)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/329 (B+Tree should delete bloom filter when it is disabled.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/372 (RWStore does not prune the CommitRecordIndex)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/375 (Persistent memory leaks (RWStore/DISK))<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/385 (FastRDFValueCoder2: ArrayIndexOutOfBoundsException)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/391 (Release age advanced on WORM mode journal)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/392 (Add a DELETE by access path method to the NanoSparqlServer)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/393 (Add &#8220;context-uri&#8221; request parameter to specify the default context for INSERT in the REST API)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/394 (log4j configuration error message in WAR deployment)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/399 (Add a fast range count method to the REST API)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/422 (Support temp triple store wrapped by a BigdataSail)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/424 (NQuads support for NanoSparqlServer)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/425 (Bug fix to DEFAULT_RDF_FORMAT for bulk data loader in scale-out)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/426 (Support either lockfile (procmail) and dotlockfile (liblockfile1) in scale-out)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/427 (BigdataSail#getReadOnlyConnection() race condition with concurrent commit)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/435 (Address is 0L)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/436 (TestMROWTransactions failure in CI)</p>
<p>1.0.2</p>
<p> &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/32  (Query time expansion of (foo rdf:type rdfs:Resource) drags in SPORelation for scale-out.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM &#8220;how to&#8221; in wiki and build.xml are out of date.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/356 (Query not terminated by error.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/361 (IRunningQuery not closed promptly.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/371 (DataLoader fails to load resources available from the classpath.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/376 (Support for the streaming of bigdata IBindingSets into a sparql query.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/378 (ClosedByInterruptException during heavy query mix.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/379 (NotSerializableException for SPOAccessPath.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/382 (Change dependencies to Apache River 2.2.0)</p>
<p>1.0.1 (*)</p>
<p> &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/107 (Unicode clean schema names in the sparse row store).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/124 (TermIdEncoder should use more bits for scale-out).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/225 (OSX requires specialized performance counter collection classes).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/348 (BigdataValueFactory.asValue() must return new instance when DummyIV is used).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limits Journal to 2B distinct RDF Values per triple/quad store instance).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/351 (SPO not Serializable exception in SIDS mode (scale-out)).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/352 (ClassCastException when querying with binding-values that are not known to the database).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/353 (UnsupportedOperatorException for some SPARQL queries).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/355 (Query failure when comparing with non materialized value).<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/357 (RWStore reports &#8220;FixedAllocator returning null address, with freeBits&#8221;.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.)<br />
 &#8211; http://sourceforge.net/apps/trac/bigdata/ticket/362 (log4j &#8211; slf4j bridge.)</p>
<p>For more information about bigdata(R), please see the following links:</p>
<p>[1] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page<br />
[2] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=GettingStarted<br />
[3] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Roadmap<br />
[4] http://www.bigdata.com/bigdata/docs/api/<br />
[5] http://sourceforge.net/projects/bigdata/<br />
[6] http://www.bigdata.com/blog<br />
[7] http://www.systap.com/bigdata.htm<br />
[8] http://sourceforge.net/projects/bigdata/files/bigdata/<br />
[9] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=DataMigration</p>
<p>About bigdata: </p>
<p>Bigdata® is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata® uses dynamically partitioned key-range shards in order to remove any realistic scaling limits &#8211; in principle, bigdata® may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata® RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=423</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SPARQL 1.1 UPDATE</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=419</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=419#comments</comments>
		<pubDate>Thu, 22 Mar 2012 17:38:29 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=419</guid>
		<description><![CDATA[We&#8217;ve just added support for SPARQL 1.1 UPDATE. This is available from r6172 in SVN and will be part of our next milestone release. You can use it through the Sesame API and the NanoSparqlServer. Check it out and let us know what you think.]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve just added support for SPARQL 1.1 UPDATE.  This is available from r6172 in <a href="https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_1_1_0">SVN</a> and will be part of our next milestone release.   You can use it through the Sesame API and the NanoSparqlServer.</p>
<p>Check it out and let us know what you think.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=419</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Custom Functions</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=415</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=415#comments</comments>
		<pubDate>Fri, 16 Mar 2012 15:56:41 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=415</guid>
		<description><![CDATA[We&#8217;ve added a new page to the wiki which documents how to write your own custom functions. The wiki page includes some examples and links you to heavily documented source code in SVN. Bigdata uses a vectored query engine. Chunks of solutions flow through the query plan operators. There is parallelism across queries, across operators <a href='http://www.bigdata.com/bigdata/blog/?p=415'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve added a new page to the wiki which documents how to write your own <a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=CustomFunction">custom functions</a>.   The wiki page includes some examples and links you to heavily documented source code in SVN.</p>
<p>Bigdata uses a vectored query engine. Chunks of solutions flow through the query plan operators. There is parallelism across queries, across operators within a query, and within an operator (multiple instances of the same operator can be evaluated in parallel). Operators broadly break down into those which operate on solutions and those which operate on value expressions.  The former are vectored, operate on chunks of solutions at a time, and have access to the indices. The latter are not vectored, operate on a single solution at a time, and do not have access to the indices.</p>
<p>People who write custom functions need to be aware of IVs, which are the &#8220;Internal Value&#8221; objects used to represent RDF Values inside of bigdata. There are a lot of different kinds of IVs, including those which are fully inline (supporting xsd datatypes, etc) and those which are recorded assigned by index (TERM2ID or BLOBS, depending on the size of the Value).  IVs are used directly in the statement indices and in query processing.</p>
<p>Solutions flowing through a bigdata query are modeled using IVs.  RDF Values in the query are batch resolved to IVs when the query is compiled and then &#8221;cached&#8221; on the IV.  This &#8220;IVCache&#8221; is the critical bit of glue which lets you access the materialized RDF Value in a custom function.  There are methods which encapsulate the work required to turn an IV into a Value and a Value into an IV.  You can use those methods and ignore the IV interface for the most part, but if you put in a little more effort you can often dramatically improve the performance of your custom function.</p>
<p>Bigdata tries to avoid RDF Value materialization whenever possible.  IVs are more compact, are faster to process, and do not require lookups against the lexicon indices.  If the query engine decides that it needs to materialize some variable before evaluating a filter or a projection, then it will do that automatically.   Custom functions which can process IVs natively are significantly faster than those which rely on materialized RDF Values.  These functions have the &#8220;NEVER&#8221; materialization requirements.  Many functions rely on materialized Values, but can use a &#8220;fast path&#8221; to quickly drop arguments which are not valid for that function.  For example, functions which require literals as arguments can test on IV.isLiteral() and throw a SparqlTypeErrorException if the argument is not a literal.  These functions have &#8220;SOMETIMES&#8221; materialization requirements.  Then there are functions which &#8220;ALWAYS&#8221; need materialized Values.  Often you can convert an ALWAYS function into a SOMETIMES function with a little bit more work and get a big performance boost for your efforts.</p>
<p>Have fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=415</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Updated BSBM v3.1 Results (53712 QMpH)</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=412</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=412#comments</comments>
		<pubDate>Wed, 14 Mar 2012 21:11:22 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=412</guid>
		<description><![CDATA[Someone was asking for BSBM v3.1 results. Here are some from the current revision in SVN against an Apple Mac Mini. Try it out on your server. You can follow the benchmarking guide on our wiki. Scale factor: 284826 Number of warmup runs: 50 Number of clients: 16 Seed: 1075 Number of query mix runs <a href='http://www.bigdata.com/bigdata/blog/?p=412'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>Someone was asking for BSBM v3.1 results.  Here are some from the current revision in SVN against an Apple Mac Mini.  Try it out on your server.  You can follow the <a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=BSBM">benchmarking guide on our wiki</a>.</p>
<pre>
Scale factor:           284826
Number of warmup runs:  50
Number of clients:      16
Seed:                   1075
Number of query mix runs (without warmups): 500 times
min/max Querymix runtime: 0.7289s / 1.6698s
Total runtime (sum):    525.696 seconds
Total actual runtime:   33.512 seconds
QMpH:                   53712.44 query mixes per hour
CQET:                   1.05139 seconds average runtime of query mix
CQET (geom.):           1.04659 seconds geometric mean runtime of query mix

Metrics for Query:      1
Count:                  500 times executed in whole run
AQET:                   0.039063 seconds (arithmetic mean)
AQET(geom.):            0.036439 seconds (geometric mean)
QPS:                    401.58 Queries per second
minQET/maxQET:          0.00889232s / 0.11675030s
Average result count:   7.98
min/max result count:   0 / 10
Number of timeouts:     0

Metrics for Query:      2
Count:                  3000 times executed in whole run
AQET:                   0.040905 seconds (arithmetic mean)
AQET(geom.):            0.038344 seconds (geometric mean)
QPS:                    383.49 Queries per second
minQET/maxQET:          0.00988646s / 0.20486457s
Average result count:   19.48
min/max result count:   6 / 36
Number of timeouts:     0

Metrics for Query:      3
Count:                  500 times executed in whole run
AQET:                   0.049103 seconds (arithmetic mean)
AQET(geom.):            0.046191 seconds (geometric mean)
QPS:                    319.47 Queries per second
minQET/maxQET:          0.01107620s / 0.23461456s
Average result count:   5.47
min/max result count:   0 / 10
Number of timeouts:     0

Metrics for Query:      4
Count:                  500 times executed in whole run
AQET:                   0.048209 seconds (arithmetic mean)
AQET(geom.):            0.045754 seconds (geometric mean)
QPS:                    325.39 Queries per second
minQET/maxQET:          0.01487138s / 0.12486670s
Average result count:   7.56
min/max result count:   0 / 10
Number of timeouts:     0

Metrics for Query:      5
Count:                  0 times executed in whole run
AQET:                   0.000000 seconds (arithmetic mean)
AQET(geom.):            NaN seconds (geometric mean)
QPS:                    Infinity Queries per second
minQET/maxQET:          179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.00000000s / 0.00000000s
Average result count:   0.00
min/max result count:   2147483647 / -2147483648
Number of timeouts:     0

Metrics for Query:      7
Count:                  2000 times executed in whole run
AQET:                   0.080021 seconds (arithmetic mean)
AQET(geom.):            0.076796 seconds (geometric mean)
QPS:                    196.04 Queries per second
minQET/maxQET:          0.02779225s / 0.33339837s
Average result count:   11.97
min/max result count:   1 / 100
Number of timeouts:     0

Metrics for Query:      8
Count:                  1000 times executed in whole run
AQET:                   0.043752 seconds (arithmetic mean)
AQET(geom.):            0.040962 seconds (geometric mean)
QPS:                    358.54 Queries per second
minQET/maxQET:          0.01055718s / 0.22980238s
Average result count:   4.85
min/max result count:   0 / 19
Number of timeouts:     0

Metrics for Query:      9
Count:                  2000 times executed in whole run
AQET:                   0.030722 seconds (arithmetic mean)
AQET(geom.):            0.028557 seconds (geometric mean)
QPS:                    510.61 Queries per second
minQET/maxQET:          0.00424333s / 0.11850076s
Average result (Bytes): 6861.40
min/max result (Bytes): 1519 / 13057
Number of timeouts:     0

Metrics for Query:      10
Count:                  1000 times executed in whole run
AQET:                   0.038099 seconds (arithmetic mean)
AQET(geom.):            0.035781 seconds (geometric mean)
QPS:                    411.74 Queries per second
minQET/maxQET:          0.00881888s / 0.17458824s
Average result count:   1.78
min/max result count:   0 / 9
Number of timeouts:     0

Metrics for Query:      11
Count:                  500 times executed in whole run
AQET:                   0.030195 seconds (arithmetic mean)
AQET(geom.):            0.027771 seconds (geometric mean)
QPS:                    519.51 Queries per second
minQET/maxQET:          0.00423775s / 0.09756225s
Average result count:   10.00
min/max result count:   10 / 10
Number of timeouts:     0

Metrics for Query:      12
Count:                  500 times executed in whole run
AQET:                   0.032718 seconds (arithmetic mean)
AQET(geom.):            0.030581 seconds (geometric mean)
QPS:                    479.46 Queries per second
minQET/maxQET:          0.00585602s / 0.10520701s
Average result (Bytes): 1476.21
min/max result (Bytes): 1446 / 1509
Number of timeouts:     0
</pre>
<p>This result is quoted for 16 concurrent clients, 50 warmup trials and 500 presentations of the query mixes. The database was the bigdata RWStore running on a single machine. These results were obtained against branches/BIGDATA_RELEASE_1_1_0 from SVN r6122. The machine is a dual core i7 (four cores total) with 4MB shared cache @ 2.7Ghz running Ubuntu 11 (Natty) with 16G of DDR3 1333MHz RAM and a single SATA3 256G SSD drive (an 2011 Apple Mac Mini). IO utilization approximately 0%.  CPU utilization was 65% during the run.  The JVM was Oracle Java 1.6.0_27 using “-server -Xmx4g -XX:+UseParallelOldGC”. The Java process size was approximately 4.4G during the benchmark run.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=412</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SPARQL 1.1 Basic Federated Query</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=399</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=399#comments</comments>
		<pubDate>Sat, 10 Mar 2012 11:45:00 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=399</guid>
		<description><![CDATA[We&#8217;ve added support for SPARQL 1.1 Basic Federated Query. We plan to add support for SPARQL 1.1 Update next, following up with a new release shortly. SPARQL 1.1 Basic Federated Query let&#8217;s you write queries against multiple SPARQL end points. Each end point is denoted in the SPARQL query using the SERVICE keyword. For example, <a href='http://www.bigdata.com/bigdata/blog/?p=399'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve added support for <a href="http://www.w3.org/TR/sparql11-federated-query/">SPARQL 1.1 Basic Federated Query</a>.  We plan to add support for <a href="http://www.w3.org/TR/sparql11-update/">SPARQL 1.1 Update</a> next, following up with a new release shortly.</p>
<p>SPARQL 1.1 Basic Federated Query let&#8217;s you write queries against multiple SPARQL end points.  Each end point is denoted in the SPARQL query using the SERVICE keyword.  For example, the following query joins local data matching <code>?s ?p1 ?o1</code> with REMOTE data matching <code>?s ?p2 ?o2</code>.  You can write queries which mix local data freely with remote data from one or more end points.  </p>
<pre>PREFIX : <http://example.org/>
SELECT ?s ?o1 ?o2
{
  ?s ?p1 ?o1 .
  SERVICE &lt;http://example.org/endpoint1&gt; {
    ?s ?p2 ?o2
  }
}
</pre>
<p>Bigdata vectors solutions flowing into and out of both SPARQL 1.0 and SPARQL 1.1 remotes end point and let&#8217;s you control the evaluation order in detail using query hint.  You can configure the level of SPARQL support for the end point using the <a href="http://bigdata.svn.sourceforge.net/viewvc/bigdata/branches/BIGDATA_RELEASE_1_1_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/service/ServiceRegistry.java?revision=6103&#038;view=markup">ServiceRegistry</a>.</p>
<p>You can also use the SERVICE keyword for <strong>internal</strong> services.  For example, our own full text search engine is implemented as a SERVICE and <a href="http://opensahara.com/">Open Sahara</a> has integrations for their text and geospatial indexing extensions which plug into bigdata using an internal SERVICE.  Internal SERVICEs look just like remote SPARQL end points in the query, but they live in the same JVM and can be much faster.  This opens up bigdata to a host of interesting integrations.  Imagine a bridge to an embedded Prolog reasoner&#8230;.</p>
<p>We have put together a wiki page which explains how to use <a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=FederatedQuery">Federated Query</a> in depth and offers tricks and tips for controlling the evaluation order using <a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=QueryHints">Query Hints</a>, <a href="https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=NamedSubquery">Named Subquery</a>.</p>
<p>You can try it out now by checking out bigdata from the <a href="https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_1_1_0">1.1.x maintenance branch in SVN</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=399</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Virtual Graphs</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=395</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=395#comments</comments>
		<pubDate>Fri, 02 Mar 2012 15:11:04 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=395</guid>
		<description><![CDATA[We&#8217;ve added support for virtual graphs to bigdata. This was done at the suggestion of David Booth who outlined this concept in a recent presentation (see page 21). With virtual graphs you can dynamically combine large numbers of named graphs into the same &#8220;virtual&#8221; graph. This achieves exactly the same purpose as specifying a large <a href='http://www.bigdata.com/bigdata/blog/?p=395'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve added support for virtual graphs to bigdata.   This was done at the suggestion of <a href="http://dbooth.org/">David Booth</a> who outlined this concept in a recent <a href="http://dbooth.org/2011/ledp/Booth_David-ledp.pdf">presentation</a> (see page 21).  With virtual graphs you can dynamically combine large numbers of named graphs into the same &#8220;virtual&#8221; graph.  This achieves exactly the same purpose as specifying a large number of FROM or FROM NAMED clauses in your SPARQL query, but the definition of what is in each graph is encapsulated in the quad store itself.</p>
<p>Virtual graphs are a quads mode feature and is available from <a href="https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_1_1_0">SVN</a> as of r6059 (this revision also uses Sesame 2.6.3, but we are not quite finished with the SPARQL Federation support).  There is a https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=VirtualGraphs which documents the virtual graphs feature.</p>
<p>Feedback is welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=395</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Graph Database APIs</title>
		<link>http://www.bigdata.com/bigdata/blog/?p=386</link>
		<comments>http://www.bigdata.com/bigdata/blog/?p=386#comments</comments>
		<pubDate>Sun, 26 Feb 2012 16:29:57 +0000</pubDate>
		<dc:creator>Bryan Thompson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.bigdata.com/bigdata/blog/?p=386</guid>
		<description><![CDATA[I had an interesting conversation with one of the blueprints developers (Joshua Shinavier) about graph databases during the CSHALS 2012 conference. Blueprints appear to follow an object model very similar to the one that Martyn Cutcher had developed in the Generic Persistent Object (GPO) / Generic Object Model (GOM). GOM allows schema flexible objects, object <a href='http://www.bigdata.com/bigdata/blog/?p=386'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>I had an interesting conversation with one of the <a href="https://github.com/tinkerpop/blueprints/wiki">blueprints</a> developers (<a href="http://fortytwo.net/Home.html">Joshua Shinavier</a>) about graph databases during the <a href="http://www.iscb.org/cshals2012">CSHALS 2012</a> conference.  </p>
<p>Blueprints appear to follow an object model very similar to the one that <a href="http://www.ctc-tech.biz/">Martyn Cutcher</a> had developed in the Generic Persistent Object (GPO) / Generic Object Model (GOM).  GOM allows schema flexible objects, object link sets, and link properties via link &#8220;reification&#8221;.  In fact, we have had most of a GPO/GOM implementation for bigdata since 2006.  The main hangup has been getting the free time to support a horizontally scaled GPO/GOM model (in particular, horizontal scaling for GPO link sets).  A very similar technology was used in the core of the K42 engine by STEP UK (K42 was a high performance object database engine underlying an XML Topic Maps engine back in 2000).</p>
<p>Bigdata also has a native provenance mode for the SAIL interface featuring Statement Identifiers (SIDs).  This mode was developed to support the intelligence and topic maps community and allows statements about statements.  We&#8217;ve <a href="http://www.bigdata.com/bigdata/blog/?p=254">blogged on this in the past</a>.  The SIDs mode let&#8217;s you attach attributes to &#8220;links&#8221; efficiently, and even let&#8217;s you attach links to links (statement identifiers can appear in any position of a statement), which is more general than the blueprints API.</p>
<p>The SIDs mode is extremely efficient.  The representation of a statement is just its &#8220;{s,p,o}&#8221; representation using the internal values (IVs) for that statement.  This means that there is no indirection through indices when performing reverse traversal from a statement about a statement to the statement which is being described.</p>
<p>However, all scalable (persistence class) graph databases use indices.  Even if you represent the object identifier as an integer, that integer is still indirected through an object index (and through the file system) in order to resolve the object.  The GPO model caches all weakly referenced objects in RAM, so once retrieved traversal is O(1), but access to disk is never less than O(log n) since it always implies indices for an updatable data model.</p>
<p>There is a long history (read war) in the network (<a href="http://en.wikipedia.org/wiki/CODASYL">CODASYL</a>) and relational database spaces.  To my mind, both groups had useful things to say.  The main argument of the relational group was that an independence between the logical representation and the physical data model was necessary and allowed for declarative query semantics which in turn allowed for sophisticated query optimization.  Query optimizers can generally produce query plans that do as well as all but the very best hand coded queries.  The network database camp pointed out the flexibility of the data model and eventually showed that it was possible to produce declarative query languages for network databases.  The issue was eventually settled in the market place, with the relational model taking the lead for several decades.  See &#8220;<a href="http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf">What goes around comes around</a>&#8221; by Stonebreaker and Hellerstein for a somewhat slanted take on all of this.</p>
<p>Object database and graph databases are very closely related to the earlier network databases.  The same benefits (schema flexibility) and cautions (lack of declarative query model) apply.  API such as blueprints can provide great convenience, but they force all query optimization onto the application writer.</p>
<p>Bigdata puts a lot of effort into query optimization.  The most obvious place is simply the join ordering.  If you want to traverse from some vertex through some edges to some set of vertices, bigdata will do fast range counts on the access paths and decide on a join order for that computation which can be several orders of magnitude faster than naive traversal.  Bigdata can also use hash joins and variable pruning to tremendously speed up queries which visit intermediate vertex sets which are not required in the final solution set.  This is possible through a combination of high level declarative query (SPARQL) and dedicated query optimization code.  When using bigdata in its SIDs (aka provenance aka graph database mode) you can get all of the benefit of that performance for &#8220;path traversal&#8221; in a &#8220;graph&#8221;.  And you can have 50 billion edges in a graph on a single machine and efficiently scale that graph out across a cluster of machines.  All in open source.</p>
<p>There are graph traversal patterns which do not fit neatly into a high level query language without loop constructs.  SPARQL actually does provide for some of these via <a href="http://www.w3.org/TR/sparql11-query/#propertypaths">property paths</a>, which we are in the process of building into bigdata.  However, you can also drop into the SAIL API with bigdata and run access paths based on triple patterns which correspond more or less directly to the vertex-edge traversal patterns of blueprints, and which support not only link attributes but also links for links.</p>
<p>There is no native blueprints implementation for bigdata.  You can certainly try the blueprints to Sail integration against the BigdataSail, but I would also encourage people to try running bigdata in its SIDs mode and enjoy the performance that you can get from optimized high level query against a high performance graph database.  If you need vertex/edge traversal, you can get that from the Sail, but you will have much higher performance if you avoid the RDF Value materialization step and stay within the bigdata native API.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdata.com/bigdata/blog/?feed=rss2&#038;p=386</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

