Bryan has been frantically benchmarking and identifying bottlenecks with the scale-out RDF store on a cluster to which we have access for a limited time. Here is a snippet of what he had to say about it, taken from an email this morning:

We have been running some experimental trials with bigdata on a cluster of 14 machines collecting metrics on RDF data load, RDF closure rates, and RDF query using the LUBM synthetic data set. This is really a shakedown period and I do not believe that these numbers represent the potential of the system. Rather, they should be taken as evidence of the minimum throughput which we might expect to see on a heavily loaded cluster. Our goal is to perform runs on the order of 20B triples. Given the inherent complexity of distributed systems, we are taking this incrementally and doing 1B triple runs right now.

The cluster is a set of relatively hefty machines. Each has 32G of RAM, 8 cores, and around 100G of local disk available to the application. The application runs 13 “data services” — one per blade. The 14th blade runs a transaction service, zookeeper (a distributed lock management service), a “metadata service” which is used to locate index partitions on the data services, and a load balancer service. We are not using hypervisor or any similar kind of technology. Each JVM runs on its local machine. We use RMI between the JVMs using APIs established for the various services. All data (other than log files) is stored on local disk on the machines.

Bigdata is designed as a key-ranged partitioned database architecture. This means that it dynamically breaks down each of the indices for the application into key ranges and (re-)distributes those index partitions across the cluster in order to balance the load on the data services. The RDF database uses 5 indices: three statement indices and two indices for mapping URIs and literals into internal identifiers and then back into RDF resources.

A run begins by pre-partitioning the data based on a sample data set (I have been using LUBM U10, with 1.2M triples). This gives us some good initial index partition boundaries which helps us to initially distribute the workload across the cluster. We then start a master on one machine and it starts data load clients on the various data services. Given N data load clients, each data load client is responsible for incrementally generating and loading 1/Nth the data into the distributed database. We took this approach since we lacked sufficient shared disk to store the pre-generated data set. Instead, we dynamically generate the data on a “just in time” basis.

Data load is linear out to at least 1B triples. Since the system uses B+Trees we expect log-linear performance as IOWAIT begins to dominate. In fact, we have currently identified a bottleneck in the “lock manager” and we are working to remove that bottleneck right now, which should increase the concurrency of the system and its throughput dramatically, in which case we will try a 10B triple run. Right now, we are observing between 50k and 60k triples per second during data load out to 1B triples (depending on the size of the thread pool used by the data services).

A sample run summary is inline below. Loading 1B triples takes approximately 6 hours, and we hope to improve on that shortly. You can also see the time required to compute the eager closure of the RDFS entailments and the net time and triples per second to load and compute the closure of the LUBM U8000 data set. I have not been tracking bytes per triple explicitly, but examining one point in the run with 821M triples loaded the on disk requirements were 89G across the cluster, which is 108 bytes per triple. That might be a reasonable value during a period of heavy write activity when there is a lot of data buffered on journals and the index partitions views are not compact. However, I expect that this goes down to about 50 bytes per triple when the system is at rest (after performing compacting merges on the index partitions and releasing unnecessary history).

Thanks,

-bryan

Fri Feb 20 17:12:39 GMT-05:00 2009
Load: tps=51038, ntriples=1154639857, nnew=1154639421, elapsed=22622977ms
namespace U8000b
class com.bigdata.rdf.store.ScaleOutTripleStore
indexManager com.bigdata.service.jini.JiniFederation
statementCount 1154639857
termCount 263124825
uriCount 173797593
literalCount 89327232
bnodeCount 0

Computing closure: now=Fri Feb 20 23:29:46 GMT-05:00 2009
closure: ClosureStats{mutationCount=272469320, elapsed=13235241ms}
Closure: tps=19940, ntriples=1418566071, nnew=263926214, elapsed=13235355ms
namespace U8000b
class com.bigdata.rdf.store.ScaleOutTripleStore
indexManager com.bigdata.service.jini.JiniFederation
statementCount 1418566071
termCount 263124825
uriCount 173797593
literalCount 89327232
bnodeCount 0

Net: tps=39556, ntriples=1418566071, nnew=1418565635, elapsed=35861664ms
Forcing overflow: now=Sat Feb 21 03:10:23 GMT-05:00 2009
Forced overflow: now=Sat Feb 21 03:15:10 GMT-05:00 2009
namespace U8000b
class com.bigdata.rdf.store.ScaleOutTripleStore
indexManager com.bigdata.service.jini.JiniFederation
statementCount 1340505170
termCount 263124825
uriCount 173797593
literalCount 89327232
bnodeCount 0

Very exciting results! Keep up the good work Bryan!