bigdata® v1.2.1

bigdata® is a scale-out data and computing fabric designed for commodity hardware.

See:
          Description

Packages
com.bigdata  
com.bigdata.attr  
com.bigdata.bfs This package provides a scale-out content repository (aka distributed file system) suitable as the backend for a REST-ful service using the bigdata architecture.
com.bigdata.bop This package provides an API for query operators.
com.bigdata.bop.aggregate  
com.bigdata.bop.ap
com.bigdata.bop.ap.filter These filters are based on the striterator patterns.
com.bigdata.bop.bindingSet  
com.bigdata.bop.bset  
com.bigdata.bop.constraint This package provides implementations for operators which impose constraints on elements visited by some access path.
com.bigdata.bop.controller This package provides operators for subquery, including UNION, STEPS, and STAR (transitive closure).
com.bigdata.bop.cost This package provides cost models for various things.
com.bigdata.bop.engine  
com.bigdata.bop.fed  
com.bigdata.bop.fed.shards  
com.bigdata.bop.join  
com.bigdata.bop.joinGraph This package provides support for join graphs, query optimization, and generating full query plans from a join graph and the evaluation order identified by a query optimizer.
com.bigdata.bop.joinGraph.fast  
com.bigdata.bop.joinGraph.rto  
com.bigdata.bop.mutation
com.bigdata.bop.rdf.aggregate  
com.bigdata.bop.rdf.filter  
com.bigdata.bop.rdf.join  
com.bigdata.bop.rdf.update This package provides bigdata operators for SPARQL update.
com.bigdata.bop.solutions This package provides distinct, sort, and aggregation operators.
com.bigdata.btree The BTree is a scalable B+-Tree with copy-on-write semantics mapping variable length unsigned byte[] keys to variable length byte[] values (null values are allowed).
com.bigdata.btree.data  
com.bigdata.btree.filter  
com.bigdata.btree.isolation  
com.bigdata.btree.keys  
com.bigdata.btree.proc  
com.bigdata.btree.raba  
com.bigdata.btree.raba.codec  
com.bigdata.btree.view  
com.bigdata.cache A canonicalizing object cache may be constructed from an outer weak reference value hash map backed by an inner hard reference LRU policy.
com.bigdata.concurrent This package supports concurrency control using exclusive locks on resources.
com.bigdata.config  
com.bigdata.counters Package supports declaration, collection, and interchange of performance counters for both hosts and processes on supported platforms.
com.bigdata.counters.ganglia  
com.bigdata.counters.httpd A httpd service for performance counters, including both live counters and post-mortem counters read from an XML interchange format.
com.bigdata.counters.linux Package provides performance counter collection on Linux platforms using the sysstat suite.
com.bigdata.counters.osx Package provides performance counter collection on OSX.
com.bigdata.counters.query  
com.bigdata.counters.render  
com.bigdata.counters.store This package provides a persistence mechanism for performance counters.
com.bigdata.counters.striped  
com.bigdata.counters.win Package provides collection of performance counters on Windows platforms.
com.bigdata.disco  
com.bigdata.gis  
com.bigdata.gom.alchemy  
com.bigdata.gom.gpo  
com.bigdata.gom.om  
com.bigdata.gom.skin  
com.bigdata.ha This package extends com.bigdata.quorum to define local and Remote interfaces for highly available services.
com.bigdata.ha.pipeline This package provides services for sending and receiving low-level write cache blocks among the members of a highly available quorum.
com.bigdata.htree  
com.bigdata.htree.data  
com.bigdata.htree.raba  
com.bigdata.io  
com.bigdata.io.compression  
com.bigdata.io.writecache Low-level write cache service.
com.bigdata.jini.lookup.entry  
com.bigdata.jini.start  
com.bigdata.jini.start.config  
com.bigdata.jini.start.process  
com.bigdata.jini.util  
com.bigdata.jmx  
com.bigdata.journal The journal is an append-only persistence capable data structure supporting atomic commit, named indices, and transactions.
com.bigdata.journal.ha This package provides some classes in support of highly available journals.
com.bigdata.jsr166 Utility classes related derived from JSR 166.
com.bigdata.mdi This package provides a metadata index and range partitioned indices managed by that metadata index.
com.bigdata.net  
com.bigdata.quorum This package defines interfaces and implementations for a quorum of highly available services.
com.bigdata.quorum.zk  
com.bigdata.rawstore A set of interfaces and some simple implementations for a read-write store without atomic commit or transactions.
com.bigdata.rdf The bigdata(R) RDF platform provides temporary, persistent, and distributed native RDF(S) stores using the bigdata(R) architecture.
com.bigdata.rdf.axioms  
com.bigdata.rdf.changesets  
com.bigdata.rdf.error  
com.bigdata.rdf.inf This package provides an eager closure inference engine for most of the RDF and RDFS entailments and can be used to realize entailments for owl:sameAs, owl:equivilentClass, and owl:equivilentProperty.
com.bigdata.rdf.internal This package provides an internal representation of RDF Values.
com.bigdata.rdf.internal.constraints  
com.bigdata.rdf.internal.encoder  
com.bigdata.rdf.internal.impl  
com.bigdata.rdf.internal.impl.bnode  
com.bigdata.rdf.internal.impl.extensions  
com.bigdata.rdf.internal.impl.literal  
com.bigdata.rdf.internal.impl.uri  
com.bigdata.rdf.lexicon  
com.bigdata.rdf.load Support for concurrent loading of RDF data across one or more clients from a variety of input sources.
com.bigdata.rdf.model This package provides a tuned implementation of the Sesame RDF data model for the RDF database.
com.bigdata.rdf.relation.rule  
com.bigdata.rdf.rio This package provides an integration with the openrdf RIO parser that supports fast data loads.
com.bigdata.rdf.rio.nquads  
com.bigdata.rdf.rio.ntriples  
com.bigdata.rdf.rio.rdfxml This package provides an extension of the openrdf RDF/XML parser which supports SIDs mode interchange and the interchange of the {explicit, inferred, axiom} statement metadata.
com.bigdata.rdf.rules  
com.bigdata.rdf.sail This package contains the SAIL that allow bigdata to be used as a backend for the Sesame 2.x platform.
com.bigdata.rdf.sail.bench  
com.bigdata.rdf.sail.config  
com.bigdata.rdf.sail.sparql This package was imported from the org.openrdf.query.parser.sparql package of the openrdf distribution.
com.bigdata.rdf.sail.sparql.ast  
com.bigdata.rdf.sail.webapp  
com.bigdata.rdf.sail.webapp.client  
com.bigdata.rdf.sparql.ast This package contains an Abstract Syntax Tree which provides an intermediate translation target for SPARQL parsers.
com.bigdata.rdf.sparql.ast.cache  
com.bigdata.rdf.sparql.ast.eval  
com.bigdata.rdf.sparql.ast.hints Query hints are specified at the SPARQL layer using magic predicates.
com.bigdata.rdf.sparql.ast.optimizers  
com.bigdata.rdf.sparql.ast.service This package provides support for SPARQL 1.1 Federated Query, including the special case of "service" end points which live within the same JVM and use direct method calls rather than SPARQL Query and remote (HTTP) end points for which we will generate an appropriate SPARQL query.
com.bigdata.rdf.spo This package defines a statement model using long term identifiers rather than RDF Value objects.
com.bigdata.rdf.store This package provides several realizations of an RDF database using the bigdata architecture, including one suitable for temporary data, one suitable for local processing (single host), and one designed for scale-out on commodity hardware.
com.bigdata.rdf.util  
com.bigdata.rdf.vocab This package provides a variety of different pre-compiled Vocabulary classes.
com.bigdata.rdf.vocab.decls This package provides a variety of declarations for different RDF ontologies.
com.bigdata.relation This package includes an abstraction layer for relations.
com.bigdata.relation.accesspath This package includes an abstraction layer for efficient access paths, including chunked iterators, blocking buffers, and an abstraction corresponding to the natural order of an index.
com.bigdata.relation.locator Abstraction layer for identifying relations and relation containers in much the same manner that indices are identified (unique name and timestamp).
com.bigdata.relation.rule This package includes an abstraction layer for rules.
com.bigdata.relation.rule.eval This package supports rule evaluation.
com.bigdata.relation.rule.eval.pipeline This package implements a pipeline join.
com.bigdata.resources This package provides the logic to managed the live journal and the historical journals and index segments for a DataService.
com.bigdata.rwstore  
com.bigdata.rwstore.sector  
com.bigdata.samples  
com.bigdata.samples.btree This package contains some sample code for using the B+Tree package.
com.bigdata.samples.remoting  
com.bigdata.search This package provides full text indexing and search.
com.bigdata.service This package provides implementations of bigdata services (metadata service, data service, transaction manager service.
com.bigdata.service.jini  
com.bigdata.service.jini.benchmark  
com.bigdata.service.jini.lookup  
com.bigdata.service.jini.master  
com.bigdata.service.jini.util  
com.bigdata.service.ndx  
com.bigdata.service.ndx.pipeline  
com.bigdata.service.proxy This package provides implementations of proxy objects for commonly used classes.
com.bigdata.sparse This package provides support for treating normal B+Trees using a "sparse row store" pattern and can be applied to both local B+Trees and scale-out indices.
com.bigdata.stream  
com.bigdata.striterator Streaming iterator patterns based on Martyn Cutcher's striterator design but supporting generics and with extensions for closable, chunked, and ordered streaming iterators.
com.bigdata.util Utility classes.
com.bigdata.util.concurrent Utility classes related to java.util.concurrent.
com.bigdata.util.config  
com.bigdata.util.httpd  
com.bigdata.zookeeper  
cutthecrap.utils.striterators  
org.apache.system This package contains a utility class SystemUtil that is capable of reporting some interesting information about the platform on which it is running, including the number of CPUs.

 

bigdata® is a scale-out data and computing fabric designed for commodity hardware. The bigdata architecture provides named scale-out indices that are transparently and dynamically key-range partitioned and distributed across a cluster or grid of commodity server platforms. The scale-out indices are B+Trees and remain balanced under insert and removal operations. Keys and values for btrees are variable length byte[]s (the keys are interpreted as unsigned byte[]s). Atomic "row" operations are supported for very high concurrency. However, full transactions are also available for applications needing less concurrency but requiring atomic operations that read or write on more than one row, index partition, or index. Writes are absorbed on mutable btree instances in append only "journals" of ~200M capacity. On overflow, data in a journal is evicted onto read-optimized, immutable "index segments". The metadata service manages the index definitions, the index partitions, and the assignment of index partitions to data services. A data service encapsulates a journal and zero or more index partitions assigned to that data service, including the logic to handle overflow.

A deployment is made up of a transaction service, a load balancer service, and a metadata service (with failover redundancy) and many data service instances. bigdata can provide data redundancy internally (by pipelining writes bound for a partition across primary, secondary, ... data service instances for that partition) or it can be deployed over NAS/SAN. bigdata itself is 100% Java and requires a JDK 1.6. There are additional dependencies for the installer (un*x) and for collecting performance counters from the OS (sysstat).

Architecture

The bigdata SOA defines three essential services and some additional services. The essential services are the metadata service (provides a locator service for index partitions on data services), the data service (provides read, write, and concurrency control for index partitions), and the transaction service (provides consistent timestamps for commits, facilitates the release of data associated with older commit points, read locks, and transactions). Full transactions are NOT required, so you can use bigdata as a scale-out row store. The load balancer service guides the dynamic redistribution of data across a bigdata federation. There are also client services, which are containers for executing distributed jobs.

While other service fabric architectures are contemplated, bigdata services today use JINI 2. to advertise themselves and do service discovery. This means that you must be running a JINI registrar in order for services to be able to register themselves or discover other services. The JINI integration is bundled and installed automatically by default.

Zookeeper handles master election, configuration management and global synchronous locks. Zookeeper was developed by Yahoo! as a distributed lock and configuration management service and is now an Apache subproject (part of Hadoop). Among other things, it gets master election protocols right. Zookeeper is bundled and installed automatically by default.

The main building blocks for the bigdata architecture are the journal (both an append-only persistence store and a recently introduced read/write store with the ability to recycle historical commit points), the mutable B+Tree (used to absorb writes), and the read-optimized immutable B+Tree (aka the index segment). Highly efficient bulk index builds are used to transfer data absorbed by a mutable B+Tree on a journal into index segment files. Each for index segment contains data for a single partition of a scale-out index. In order to read from an index partition, a consistent view is created by dynamically fusing data for that index partition, including any recent writes on the current journal, any historical writes that are in the process of being transferred onto index segments, and any historical index segments that also contain data for that view. Periodically, index segments are merged together, at which point deleted tuples are purged from the view.

Bigdata periodically releases data associated with older commit points, freeing up disk resources. The transaction service is configured with a minimum release age in milliseconds. This can be ZERO (0L) milliseconds, in which case historical views may be released if there are no read locks for that commit point. The minimum release age can also be hours or days if you want to keep historical states around for a while. When a data service overflows, it will consult the transaction service to determine the effective release time and release any old journals or index segments no longer required to maintain views GT that release time.

An immortal or temporal database can be realized by specifying Long#MAX_VALUE for the minimum release age. In this case, the old journals and index segments will all be retained and you can query any historical commit point of the database at any time.

An detailed architecture whitepaper for bigdata is posted only and linked from our blog.

Sparse row store

The SparseRowStore provides a column flexible row store similar to Google's bigtable or HBase, including very high read/write concurrency and ACID operations on logical "rows". Internally, a "global row store" instance is used to maintain metadata on relations declared within a bigdata federation. You can use this instance to store your own data, or you can create your own named row store instances. However, there is no REST api for the row store at this time (trivial to be sure, but not something that we have gotten around to yet).

In fact, it is trivial to realize bigtable semantics with bigdata - all you need to do is exercise a specific protocol when forming the keys for your scale-out indices and then you simply choose to NOT use transactions. A bigtable style key-value is formed as:

  
      [columnFamily][primaryKey][columnName][timestamp} : [value]
  
  

By placing the column family identifier up front, all data in the same column family will be clustered together by the index. The next component is the "row" identifier, what you would think of as the primary key in a relational table. The column name comes next - only column names for non-null columns are written into the index. Finally, there is a timestamp column that is used either to record a timestamp specified by the application or a datum write time. The value associated with the key is simply the datum for that column in that row. The use of nul byte separators makes it possible to parse the key, which is required for various operations including index partition splits and filtering key scans based on column names or timestamps. See the KeyBuilder class in com.bigdata.btree.keys for utilities that may be used to construct keys for variety of data types.

Map/reduce, Asynchronous Write API, and Query

Google's map/reduce architecture has received a lot of attention, along with its bigtable architecture. Map/reduce provides a means to transparently decompose processing across a cluster. The "map" process examines a series of key-value pairs, emitting a set of intermediate key-value pairs for each input. Those intermediate key-values are then hashed (module R) onto R reduce processes. The inputs for the reduce processes are pre-sorted. The reduce process then runs some arbitrary operation on the sorted data, such as computing an inverted index file or loading the data into a scale-out index.

bigdata® supports an asynchronous index write API, which delivers extremely high throughput for scattered writes. While map/reduce is tremendously effective when there is good locality in the data, it is not the right tool for processing ordered data. Instead, you execute a master job, which spawns client(s) running in client service(s) associated with the bigdata federation. Those clients process data, writing onto blocking buffers. The writes are automatically split and buffered for each key-range shard. This maximizes the chunk size for ordered writes and provides a tremendous throughput boost. Bigdata can work well in combination with map/reduce. The basic paradigm is you use map/reduce jobs to generate data, which is then bulk loaded into a bigdata federation using bigdata jobs and the asynchronous write API.

bigdata® has built in support for distributed rule execution. This can be used for high-level query or for materializing derived views, including maintaining the RDFS+ closure of a semantic web database. The implementation is highly efficient and propagates binding sets to the data service for each key-range shard touched by the query, so the JOINs happen right up against the data. Unlike using map/reduce for join processing, bigdata query processing is very low latency. Distributed query execution can be substantially faster than local query execution, even for low-latency queries.

Standalone Journals

While bigdata® is targeted at scale-out federations, it can also be deployed as simple persistence store using just the com.bigdata.journal.Journal API.

The read-write (RWStore) version of the journal can scale up to 50 billion triples or quads and is capable of reclaiming storage by releasing historical commit points, aging them out of the backing file in a manner very similar to how the scale-out database releases historical commit points. The read/write store is good for standalone database instances, especially when the data have a lot of skew and when the new data are arriving continually while older data should be periodically released (for example, in a monitoring application where historical events may be released after 30 days).

The read/write store is also used in the scale-out architecture for the transaction manager and the aggregated performance counters. However, the data services use a WORM store to buffer writes, asynchronously migrate the buffered writes onto read-optimized B+Tree files using index segments builds and compacting. One an index partition (aka shard) reaches ~ 200MB on the disk (dynamic sharding). Index partitions are moved from time to time to load balance the cluster.

Status

bigdata® is a petabyte scale database architecture. It has been tested on clusters of up to 16 nodes. We have loaded data sets of 10B+ rows, at rates of over 300,000 rows per second. Overall, 100s of billions of rows have been put down safely on disk.

Getting Started

See the wiki for Getting Started and our blog for what's new. The javadoc is online and you can also build it with the ant script. If you have a question, please post it on the blog or the forum.

Getting Involved

bigdata® is an open source project. Contributors and contributions are welcome. Like most open source project, contributions must be submitted under a contributor agreement, which must be signed by someone with the appropriate authority. This is necessary to ensure that the code base remains open.

If you want to help out, please check out what is going on our blog and on the main project site. Post your questions and we will help you figure out where you can contribute or how to create that new feature that you need.

Licenses and Services

bigdata® is distributed under GPL(v2). SYSTAP, LLC offers commercial licenses for customers who either want the value add (warranty, technical support, additional regression testing), who want to redistribute bigdata with their own commercial products, or who are not "comfortable" with the GPL license. For inquiries or further information, please write licenses@bigdata.com.

Please let us know if you need specific feature development or help in applying bigdata® to your problem. We are especially interested in working directly with people who are trying to handle massive data, especially for the semantic web. Please contact us directly.

Related links

CouchDB
http://couchdb.org/CouchDB/CouchDBWeb.nsf/Home?OpenForm
bigtable
http://labs.google.com/papers/bigtable.html, http://www.techcrunch.com/2008/04/04/source-google-to-launch-bigtable-as-web-service/
map/reduce
http://labs.google.com/papers/mapreduce.html
Hadoop
http://lucene.apache.org/hadoop/
Zookeeper
http://hadoop.apache.org/zookeeper/
Jini/River
http://www.jini.org/wiki/Main_Page, http://incubator.apache.org/river/RIVER/index.html
Pig
http://research.yahoo.com/node/90
Sawzall
http://labs.google.com/papers/sawzall.html
Boxwood
http://research.microsoft.com/research/sv/Boxwood/
Blue Cloud
http://www.techcrunch.com/2007/11/15/ibms-blue-cloud-is-web-computng-by-another-name/
SimpleDB
http://www.techcrunch.com/2007/12/14/amazon-takes-on-oracle-and-ibm-with-simple-db-beta/
mg4j
http://mg4j.dsi.unimi.it/



Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.