You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

User:AndreaWest/WDQS Testing: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>AndreaWest
(Wikidata-specific queries)
imported>AndreaWest
Line 5: Line 5:
** Definition of specific INSERT, DELETE, CONSTRUCT and SELECT queries for performance and capabilities analysis
** Definition of specific INSERT, DELETE, CONSTRUCT and SELECT queries for performance and capabilities analysis
** Definition of read/write workloads for stress testing
** Definition of read/write workloads for stress testing
** Goal to test both system characteristics and SPARQL compliance, and behavior in real-world scenarios
** Tests of system characteristics and SPARQL compliance, and to evaluate system behavior under load


== Testing Wikidata-Specific Updates and Queries ==
== Test Design ==
Design based on insights gathered largely from the following papers:
Design based on insights gathered (largely) from the following papers:
* [https://arxiv.org/abs/1708.00363 An Analytical Study of Large SPARQL Query Logs]
* [https://arxiv.org/abs/1708.00363 An Analytical Study of Large SPARQL Query Logs]
* [https://iccl.inf.tu-dresden.de/w/images/5/5a/Malyshev-et-al-Wikidata-SPARQL-ISWC-2018.pdf Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph]
* [https://iccl.inf.tu-dresden.de/w/images/5/5a/Malyshev-et-al-Wikidata-SPARQL-ISWC-2018.pdf Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph]
* [https://hal.inria.fr/hal-02096714/document Navigating the Maze of Wikidata Query Logs]
* [https://hal.inria.fr/hal-02096714/document Navigating the Maze of Wikidata Query Logs]


Also, the following analyses examined more recent data:
Also, the following analyses (conducted by members of the WDQS team) examined more recent data:
* [https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Queries_Analysis WDQS Queries Analysis]
* [https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Queries_Analysis WDQS Queries Analysis]
* Subpages linked from https://wikitech.wikimedia.org/wiki/User:AKhatun
* Subpages linked from https://wikitech.wikimedia.org/wiki/User:AKhatun


TBD - Address different query and update patterns, including a variety of SPARQL features (such as FILTER, OPTIONAL, GROUP BY, ...), federation, geospatial analysis, support for label, GAS, sampling and MediaWiki "services", and more
== Testing Wikidata-Specific Updates and Queries ==
Requirement to address a wide variety of SPARQL language constructs (such as FILTER, OPTIONAL, GROUP BY, ...), and query and update patterns. Testing will include federated and geospatial queries, and support for the (evolution of the) label, GAS and MediaWiki services.
 
Tests will be defined to exercise:
* SELECT, ASK and CONSTRUCT queries, as well as INSERT and DELETE updates
* SPARQL language constructs
** Solution modifiers - Distinct, Limit, Offset, Order By
** Algebraic operators - Filter, Union, Optional, Exists, Not Exists, Minus
** Aggregation operators - Count, Min/Max, Avg, Sum, Group By, Having
* With combinations of the above language constructs
** xxx
* With varying numbers of triples (from 1 to 50+)
* Utilizing different property path lengths and structures
** xxx
* Using different graph patterns
** xxx
 
The tests will be defined using query templates with varying entity selections to avoid pre-defined, static queries (that are known in advance and for which a platform can be tuned). They will be executed in batches and the following statistics collected:
* Execution time (longest, shortest, average) or time out
* Correctness and completeness of response/update


== Workload Testing ==
== Workload Testing ==
Combinations of above (TBD)
This evaluation will utilize combinations of the above queries/updates (TBD), characterized by the actual workloads captured on the [https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=now-6M&to=now&refresh=1m WDQS queries] and [https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?orgId=1 Streaming Updater] dashboards. These workloads reflect both user and bot queries.
 
For each test iteration, the following will be reported:
* Total execution time
* Mean and geometric mean across the individual queries
* Number of queries that executed and completed, and their times
* Number of queries that timed out
* Number of results for queries that completed
 
== Test Infrastructure ==
TBD ... The test infrastructure will likely utilize one or more of the existing [https://wikitech.wikimedia.org/wiki/User:AndreaWest/Background_on_SPARQL_Benchmarks#Test_Frameworks frameworks or tools].


== Background on SPARQL Benchmarks ==
== Background on SPARQL Benchmarks ==
See [https://wikitech.wikimedia.org/wiki/User:AndreaWest/Background_on_SPARQL_Benchmarks Background on SPARQL Benchmarks].
See [https://wikitech.wikimedia.org/wiki/User:AndreaWest/Background_on_SPARQL_Benchmarks Background on SPARQL Benchmarks].

Revision as of 01:23, 13 April 2022

This page overviews a design and specific suggestions for Wikidata SPARQL query testing. These tests will be useful to evaluate Blazegraph backend alternatives and to (possibly) establish a Wikidata SPARQL benchmark for the industry.

Goals

  • Definition of multiple data sets exercising the SPARQL functions and complexities seen in actual Wikidata queries, as well as extensions, federated query, and workloads
    • Definition of specific INSERT, DELETE, CONSTRUCT and SELECT queries for performance and capabilities analysis
    • Definition of read/write workloads for stress testing
    • Tests of system characteristics and SPARQL compliance, and to evaluate system behavior under load

Test Design

Design based on insights gathered (largely) from the following papers:

Also, the following analyses (conducted by members of the WDQS team) examined more recent data:

Testing Wikidata-Specific Updates and Queries

Requirement to address a wide variety of SPARQL language constructs (such as FILTER, OPTIONAL, GROUP BY, ...), and query and update patterns. Testing will include federated and geospatial queries, and support for the (evolution of the) label, GAS and MediaWiki services.

Tests will be defined to exercise:

  • SELECT, ASK and CONSTRUCT queries, as well as INSERT and DELETE updates
  • SPARQL language constructs
    • Solution modifiers - Distinct, Limit, Offset, Order By
    • Algebraic operators - Filter, Union, Optional, Exists, Not Exists, Minus
    • Aggregation operators - Count, Min/Max, Avg, Sum, Group By, Having
  • With combinations of the above language constructs
    • xxx
  • With varying numbers of triples (from 1 to 50+)
  • Utilizing different property path lengths and structures
    • xxx
  • Using different graph patterns
    • xxx

The tests will be defined using query templates with varying entity selections to avoid pre-defined, static queries (that are known in advance and for which a platform can be tuned). They will be executed in batches and the following statistics collected:

  • Execution time (longest, shortest, average) or time out
  • Correctness and completeness of response/update

Workload Testing

This evaluation will utilize combinations of the above queries/updates (TBD), characterized by the actual workloads captured on the WDQS queries and Streaming Updater dashboards. These workloads reflect both user and bot queries.

For each test iteration, the following will be reported:

  • Total execution time
  • Mean and geometric mean across the individual queries
  • Number of queries that executed and completed, and their times
  • Number of queries that timed out
  • Number of results for queries that completed

Test Infrastructure

TBD ... The test infrastructure will likely utilize one or more of the existing frameworks or tools.

Background on SPARQL Benchmarks

See Background on SPARQL Benchmarks.