I am going to lead up to how we get that outrageous score, but first let me provide some background about the most recent official BSBM run and then I will go into how bigdata performed in the official runs and why we are able to report a much higher result here.

We were invited to participate in the recently concluded BSBM V3.0 benchmark. BSBM V3.0 includes a new data set (statistically similar, but different data) and new use cases for the SPARQL 1.1 update and aggregation extensions. Bigdata does not yet support the SPARQL 1.1 features, so we were not able to participate in those aspects of the benchmark.

We’ve been looking at possible ways to “port” bigdata federations onto Amazon Web Services (AWS [5]). To my mind, the big questions are how to best provide for:

1. low latency durable writes
2. low latency reads
3. long term durable storage
4. blob storage (assuming the proposed lexicon refactor [1])

Unfortunately, things like EBS (Amazon’s Elastic Block Storage [2]) are not great matches since they are linked to a single EC2[4] instance at a time. While we could work around the single instance limit by exposing the EBS volume as an NFS mount, there would be significant latency if the instance exposing that volume were to go down.

However, I think that we can map these requirements onto AWS as follows:

For low latency durable writes, I propose that we:

  • Assign a quorum (3 or 5) of compute nodes the responsibility for absorbing writes bound for a given shard.
  • Replicate writes across the quorum for durability.