You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

User talk:AndreaWest/Blazegraph Features and Capabilities

From Wikitech-static
Revision as of 18:34, 26 May 2022 by imported>AndreaWest
Jump to navigation Jump to search

wikibase:around and wikibase:box

@AndreaWest: The crucial feature about wikibase:around and wikibase:box is that Blazegraph impemented a geo-spatial index for P625 coordinate location values. (I believe based on a simple w:Z-order curve representation, reducing 2d points to 1d strings). As a result it is very fast for Blazegraph to look up items close to a specific geographical point. This is the key underlying requirement to have an effective wikibase:around and wikibase:box capability.

Using an index is very different to the first approach selected on the page (retrieve everything and then filter) -- because the set of 'everything' (eg all buildings / everything with a heritage designation / everything with a wikidata item) may be very big indeed. Without an index, such queries rapidly become prohibitive. Jheald (talk) 10:30, 25 May 2022 (UTC)

Excellent points! I was thinking of basic GeoSPARQL examples and not performance. Your feedback made me rethink and improve things. I updated the "User page" with information on converting the current Wikidata to be compliant with GeoSPARQL, and have updated the queries to address performance. Performance will be enhanced by geospatial indexes (which endpoints like Jena have). In addition, the queries will be simpler if GeoSPARQL query rewriting is supported. I will make sure to test these aspects in the GeoSPARQL compliance tests.
Hopefully, this addresses your concern. Andrea Westerinen (talk) 15:42, 26 May 2022 (UTC)

Named Subqueries

Named subqueries have become popular for query readability (by breaking it into intelligible chunks), and as a usefully intuitive way to steer execution sequence inside a query (to indicate that a particular group of statements need to be executed first). It may be possible to accommodate these with a preprocessor that replaces the INCLUDE directive with the relevant subquery text as a conventional inline subquery. (Noting that a subquery can itself INCLUDE further subqueries). This would at least allow existing queries to run, if alternate engines did not recognise the Blazegraph named subquery syntax.

They may still be less efficient than Blazegraph however, if the alternate engine cannot recognise the same subquery being invoked for its results more than once -- eg https://w.wiki/H6b as a simple example, where some counts are calculated, and then expressed as a percentage of their total, where the counts are reused to calculate the total, rather than the total being calculated separately from scratch. (This is only a very simple example. In other cases the subquery results being reused may rather more involved and time-consuming to determine; and their re-use may be part of a longer, more involved chain of stages). Jheald (talk) 11:16, 25 May 2022 (UTC)

I agree that named subqueries improve readability, but this is not a SPARQL compliant feature and equivalent functionality is easily achieved (even for sub-queries in sub-queries). Yes, it MAY involve some cut and paste ugliness. I will add an example of your H6b query to the user page.
Please send me an example of a named subquery that you consider "more involved", so that I can validate that it can be accomplished without naming but maybe with shortcuts. Thanks! Andrea Westerinen (talk) 15:48, 26 May 2022 (UTC)