Tuesday, October 27, 2009

Release 0.81b

We've just released a new version of bigdata. This release is capable of loading 1B triples in under one hour on a 15 node cluster and has been used to load up to 13B triples on the same cluster. JDK 1.6 is required. See [1] for instructions on installing bigdata(R), [2] for the javadoc and [3] and [4] for news, questions, and the latest developments.

Please note that we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse. You can also build the code using the ant script. The cluster installer requires the use of the ant script. You can checkout this release from the following URL:

https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/RELEASE_0.81b

New features:
  • Support for quads.
  • The B+Tree data record is now the same on the disk and in memory, eliminating de-serialization costs for immutable B+Tree records.
  • Shared LRU for all B+Tree instances in the same JVM. This provides competition across those B+Tree instances for RAM. The LRU is configured by default to use 10% of the JVM heap, but that value may be increased to 20% even on very heavy workloads with large heaps.
  • Parallel iterator for distributed access paths.
  • 3x improvement in distributed query performance. We have only just begun to optimize distributed query. This performance improvement is mainly due to the parallel access path iterator, the shared LRU, and the selection of a better chunk size for distributed query. We expect substantial improvements in query performance over the next several months.
  • Some query hotspot elimination.

The roadmap for the next release includes:
  • Full transactions for the SAIL.
  • Record level compression.
  • Query optimizations.
For more information, please see the following links:

[1] http://bigdata.wiki.sourceforge.net/GettingStarted
[2] http://www.bigdata.com/bigdata/docs/api/
[3] http://sourceforge.net/projects/bigdata/
[4] http://www.bigdata.com/blog

About bigdata:

Bigdata® is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata® uses dynamically partitioned key-range shards in order to remove any realistic scaling limits - in principle, bigdata® may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata® RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance.

0 Comments:

Post a Comment

<< Home