I am going to lead up to how we get that outrageous score, but first let me provide some background about the most recent official BSBM run and then I will go into how bigdata performed in the official runs and why we are able to report a much higher result here.

We were invited to participate in the recently concluded BSBM V3.0 benchmark. BSBM V3.0 includes a new data set (statistically similar, but different data) and new use cases for the SPARQL 1.1 update and aggregation extensions. Bigdata does not yet support the SPARQL 1.1 features, so we were not able to participate in those aspects of the benchmark.  The results of the benchmark are now online and the report to the lod2 project is here. Many thanks to Chris, Andreas, and the lod2 project for sponsoring this benchmark trial.

The official benchmark reports reasonable results for bigdata. However, those results are not the entire picture and substantially under represent how fast bigdata query performance really is.  Let me explain.

First, we are a bit slow on the data load. This is primarily due to the presence of large literals (kilobytes each) in the BSBM data which wind up clogging the B+Tree leaves in which they are stored. To address this, we are changing how we store RDF literals in order to reduce the IO churn. This is something that we will be fixing over the next few weeks, so stay tuned for an update on that.

In previous versions of BSBM, the benchmark included a reduced query mix.  This query mix excluded two queries, Q5 and Q6, which were problematic for the triples stores under test.  In BSBM V3.0, Q6 has been dropped completely from the query mix (it requires an free text index which supports regular expressions to do well on that query).  However, this time around results were not published for a “reduced query mix” — that is, without Q5.

Q5 makes a tremendous difference in query performance for bigdata.  We are continuing to work on query optimization for Q5, which requires reasoning about the join order in the presence of SPARQL filters and attaching constraints at the earliest possible point at which they can run.  With Q5, we scored 4286 QMpH for 8 clients against the 100M data set (this is the value in the published report).  However, without Q5, bigdata delivers a whalloping 36608 under the same conditions.   That is on par with the best result reported in the benchmark, which was 36269.  The detailed results are below.

This is not a perfect apples to apples comparison since (a) the results below were obtained on a different machine (less RAM, but faster disks); and (b) the “reduced query mix” results were not reported for the other databases so we can not say how their performance would change without Q5.  However, it does show the high performance you should expect from bigdata in our next release.  In the meanwhile, we are going to keep working on query optimization for Q5.   We should have it licked soon.

Scale factor: 284826

Number of warmup runs: 50
Number of clients: 8
Seed: 9175932
Number of query mix runs (without warmups): 500 times min/max Querymix runtime: 0.5236s / 1.2590s
Total runtime (sum): 384.261 seconds
Total actual runtime: 49.168 seconds
QMpH: 36608.83 query mixes per hour
CQET: 0.76852 seconds average runtime of query mix
CQET (geom.): 0.75804 seconds geometric mean runtime of query mix

Metrics for Query: 1
Count: 500 times executed in whole run
AQET: 0.023844 seconds (arithmetic mean)
AQET(geom.): 0.021225 seconds (geometric mean)
QPS: 327.77 Queries per second
minQET/maxQET: 0.00430900s / 0.19842800s
Average result count: 7.88
min/max result count: 0 / 10
Number of timeouts: 0

Metrics for Query: 2
Count: 3000 times executed in whole run
AQET: 0.027569 seconds (arithmetic mean)
AQET(geom.): 0.024954 seconds (geometric mean)
QPS: 283.48 Queries per second
minQET/maxQET: 0.00580500s / 0.22326000s
Average result count: 19.36
min/max result count: 7 / 37
Number of timeouts: 0

Metrics for Query: 3
Count: 500 times executed in whole run
AQET: 0.109546 seconds (arithmetic mean)
AQET(geom.): 0.085314 seconds (geometric mean)
QPS: 71.34 Queries per second
minQET/maxQET: 0.00909600s / 0.50606200s
Average result count: 5.57
min/max result count: 0 / 10
Number of timeouts: 0

Metrics for Query: 4
Count: 500 times executed in whole run
AQET: 0.035182 seconds (arithmetic mean)
AQET(geom.): 0.032202 seconds (geometric mean)
QPS: 222.14 Queries per second
minQET/maxQET: 0.00587100s / 0.22067800s
Average result count: 7.56
min/max result count: 0 / 10
Number of timeouts: 0

Metrics for Query: 5
Count: 0 times executed in whole run
AQET: 0.000000 seconds (arithmetic mean)
AQET(geom.): NaN seconds (geometric mean)
QPS: Infinity Queries per second
minQET/maxQET: 179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.00000000s / 0.00000000s
Average result count: 0.00
min/max result count: 2147483647 / -2147483648
Number of timeouts: 0

Metrics for Query: 7
Count: 2000 times executed in whole run
AQET: 0.039152 seconds (arithmetic mean)
AQET(geom.): 0.036275 seconds (geometric mean)
QPS: 199.61 Queries per second
minQET/maxQET: 0.00948900s / 0.22935300s
Average result count: 12.31
min/max result count: 1 / 100
Number of timeouts: 0

Metrics for Query: 8
Count: 1000 times executed in whole run
AQET: 0.028075 seconds (arithmetic mean)
AQET(geom.): 0.025042 seconds (geometric mean)
QPS: 278.37 Queries per second
minQET/maxQET: 0.00416200s / 0.21368300s
Average result count: 4.80
min/max result count: 0 / 19
Number of timeouts: 0

Metrics for Query: 9
Count: 2000 times executed in whole run
AQET: 0.028693 seconds (arithmetic mean)
AQET(geom.): 0.025425 seconds (geometric mean)
QPS: 272.37 Queries per second
minQET/maxQET: 0.00225400s / 0.22867700s
Average result (Bytes): 6762.83
min/max result (Bytes): 1528 / 12831
Number of timeouts: 0

Metrics for Query: 10
Count: 1000 times executed in whole run
AQET: 0.025270 seconds (arithmetic mean)
AQET(geom.): 0.022319 seconds (geometric mean)
QPS: 309.27 Queries per second
minQET/maxQET: 0.00443900s / 0.21323000s
Average result count: 1.87
min/max result count: 0 / 9
Number of timeouts: 0

Metrics for Query: 11
Count: 500 times executed in whole run
AQET: 0.034836 seconds (arithmetic mean)
AQET(geom.): 0.032530 seconds (geometric mean)
QPS: 224.34 Queries per second
minQET/maxQET: 0.01361800s / 0.22590800s
Average result count: 10.00
min/max result count: 10 / 10
Number of timeouts: 0

Metrics for Query: 12
Count: 500 times executed in whole run
AQET: 0.021630 seconds (arithmetic mean)
AQET(geom.): 0.018793 seconds (geometric mean)
QPS: 361.32 Queries per second
minQET/maxQET: 0.00390600s / 0.20190300s
Average result (Bytes): 1470.73
min/max result (Bytes): 1433 / 1507
Number of timeouts: 0

This result is quoted using the same seed as the official benchmark run and 8 clients using 50 warmup trials and 500 presentations of the query mixes. The database was the bigdata RWStore running on a single machine. These results were obtained against the QUADS_QUERY_BRANCH from SVN r4241. The machine is a quad core AMD Phenom II X4 with 8MB Cache @ 3Ghz running Centos with 16G of RAM and a striped RAID array with 6x SAS disks with 15k spindles (Seagate Cheetah with 16MV Cache, 3.5″). IO utilization approximately 50%.  CPU utilization was 50% during the run.  The JVM was Oracle Java 1.6.0_23 using “-server -Xmx4g -XX:+UseParallelOldGC”. The Java process size was approximately 3.6G during the benchmark run.

The machine used by the BSBM V3.0 benchmark has more RAM and slower disks.  However, the benchmark protocol has a ramp up procedure and much of the data winds up cached in RAM.  As a result, BSBM preferentially favors machines with more RAM over machines with faster disks.

© 2006-2010 by SYSTAP, LLC bigdata® is a registered trademark of SYSTAP, LLC. Suffusion WordPress theme by Sayontan Sinha