You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

User:AndreaWest/WDQS Testing: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>AndreaWest
(→‎Background on SPARQL Benchmarks: Moved details to a separate page)
imported>AndreaWest
(Wikidata-specific queries)
Line 7: Line 7:
** Goal to test both system characteristics and SPARQL compliance, and behavior in real-world scenarios
** Goal to test both system characteristics and SPARQL compliance, and behavior in real-world scenarios


== Testing Specific Updates and Queries ==
== Testing Wikidata-Specific Updates and Queries ==
Address different query and update patterns, including a variety of SPARQL features (such as FILTER, OPTIONAL, GROUP BY, ...), federation, geospatial analysis, support for label, GAS, sampling and MediaWiki "services", and more
Design based on insights gathered largely from the following papers:
* [https://arxiv.org/abs/1708.00363 An Analytical Study of Large SPARQL Query Logs]
* [https://iccl.inf.tu-dresden.de/w/images/5/5a/Malyshev-et-al-Wikidata-SPARQL-ISWC-2018.pdf Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph]
* [https://hal.inria.fr/hal-02096714/document Navigating the Maze of Wikidata Query Logs]
 
Also, the following analyses examined more recent data:
* [https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Queries_Analysis WDQS Queries Analysis]
* Subpages linked from https://wikitech.wikimedia.org/wiki/User:AKhatun
 
TBD - Address different query and update patterns, including a variety of SPARQL features (such as FILTER, OPTIONAL, GROUP BY, ...), federation, geospatial analysis, support for label, GAS, sampling and MediaWiki "services", and more


== Workload Testing ==
== Workload Testing ==
TBD
Combinations of above (TBD)


== Background on SPARQL Benchmarks ==
== Background on SPARQL Benchmarks ==
See [https://wikitech.wikimedia.org/wiki/User:AndreaWest/Background_on_SPARQL_Benchmarks Background on SPARQL Benchmarks].
See [https://wikitech.wikimedia.org/wiki/User:AndreaWest/Background_on_SPARQL_Benchmarks Background on SPARQL Benchmarks].

Revision as of 22:56, 6 April 2022

This page overviews a design and specific suggestions for Wikidata SPARQL query testing. These tests will be useful to evaluate Blazegraph backend alternatives and to (possibly) establish a Wikidata SPARQL benchmark for the industry.

Goals

  • Definition of multiple data sets exercising the SPARQL functions and complexities seen in actual Wikidata queries, as well as extensions, federated query, and workloads
    • Definition of specific INSERT, DELETE, CONSTRUCT and SELECT queries for performance and capabilities analysis
    • Definition of read/write workloads for stress testing
    • Goal to test both system characteristics and SPARQL compliance, and behavior in real-world scenarios

Testing Wikidata-Specific Updates and Queries

Design based on insights gathered largely from the following papers:

Also, the following analyses examined more recent data:

TBD - Address different query and update patterns, including a variety of SPARQL features (such as FILTER, OPTIONAL, GROUP BY, ...), federation, geospatial analysis, support for label, GAS, sampling and MediaWiki "services", and more

Workload Testing

Combinations of above (TBD)

Background on SPARQL Benchmarks

See Background on SPARQL Benchmarks.