You are browsing a read-only backup copy of Wikitech. The live site can be found at

User talk:AndreaWest/Blazegraph Features and Capabilities

From Wikitech-static
Revision as of 11:16, 25 May 2022 by imported>Jheald (→‎Named Subqueries: new section)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

wikibase:around and wikibase:box

@AndreaWest: The crucial feature about wikibase:around and wikibase:box is that Blazegraph impemented a geo-spatial index for P625 coordinate location values. (I believe based on a simple w:Z-order curve representation, reducing 2d points to 1d strings). As a result it is very fast for Blazegraph to look up items close to a specific geographical point. This is the key underlying requirement to have an effective wikibase:around and wikibase:box capability.

Using an index is very different to the first approach selected on the page (retrieve everything and then filter) -- because the set of 'everything' (eg all buildings / everything with a heritage designation / everything with a wikidata item) may be very big indeed. Without an index, such queries rapidly become prohibitive. Jheald (talk) 10:30, 25 May 2022 (UTC)

Named Subqueries

Named subqueries have become popular for query readability (by breaking it into intelligible chunks), and as a usefully intuitive way to steer execution sequence inside a query (to indicate that a particular group of statements need to be executed first). It may be possible to accommodate these with a preprocessor that replaces the INCLUDE directive with the relevant subquery text as a conventional inline subquery. (Noting that a subquery can itself INCLUDE further subqueries). This would at least allow existing queries to run, if alternate engines did not recognise the Blazegraph named subquery syntax.

They may still be less efficient than Blazegraph however, if the alternate engine cannot recognise the same subquery being invoked for its results more than once -- eg as a simple example, where some counts are calculated, and then expressed as a percentage of their total, where the counts are reused to calculate the total, rather than the total being calculated separately from scratch. (This is only a very simple example. In other cases the subquery results being reused may rather more involved and time-consuming to determine; and their re-use may be part of a longer, more involved chain of stages). Jheald (talk) 11:16, 25 May 2022 (UTC)