We’ve been thinking a lot recently about how to specify custom indices to support performant queries.  As mentioned previously the use of custom indices is a key part of providing support for domain views of data, where a domain view could be defined by a specific set of queries.
The end game of this process is the identification of patterns that combine a mapping of a domain model with a set of client classes onto a triple-based representation with declarative supportive indices.

Conventionally, to support generic SPARQL queries, SPO, OSP and POS are maintained.  If we consider triples to be simply small bits of data to be searched and aggregated, then these general indices are pretty much all we can plan for.
However, consider a requirement for a simple domain query to locate all employees with a specific surname from a company:
SELECT ?Employee WHERE {?Employee works-for CompanyX . ?Employee surname Smith}
If the cardinalities of the clauses are high then this query could be very expensive using the generic indices. To support this in SQL where all employee properties were stored in a single table is straightforward, just specify a composite index.  But for a triple-store we would need to specify a constraint to be satisfied to maintain an index.  Let’s call this BTree+ index “EMPLOYEE_SURNAME”, the key would be Company-Surname-Employee to be maintained for triples where {?Employee works-for ?Company . ?Employee surname ?Surname}.
It is important to recognise just how specific this index is, yes we could use it provide a list of all a company’s employees Surnames, but it does not help to find all the companies who have employees with a specific Surname, although we could create a Surname-Company-Employee based BTree+ if the domain required it.
The main point about this index is that it spans multiple triples and that support for such indices provides opportunities for query optimisations and thus flexible domain views.
Domain specific interfaces could take explicit advantage of such indices, while the real challenge is to be able to maintain and utilize such indices reliably and efficiently even with generic interfaces.
© 2006-2010 by SYSTAP, LLC bigdata® is a registered trademark of SYSTAP, LLC. Suffusion WordPress theme by Sayontan Sinha