You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

User:AndreaWest/WDQS Testing

From Wikitech-static
< User:AndreaWest
Revision as of 01:23, 13 April 2022 by imported>AndreaWest (→‎Workload Testing)
Jump to navigation Jump to search

This page overviews a design and specific suggestions for Wikidata SPARQL query testing. These tests will be useful to evaluate Blazegraph backend alternatives and to (possibly) establish a Wikidata SPARQL benchmark for the industry.

Goals

  • Definition of multiple data sets exercising the SPARQL functions and complexities seen in actual Wikidata queries, as well as extensions, federated query, and workloads
    • Definition of specific INSERT, DELETE, CONSTRUCT and SELECT queries for performance and capabilities analysis
    • Definition of read/write workloads for stress testing
    • Tests of system characteristics and SPARQL compliance, and to evaluate system behavior under load

Test Design

Design based on insights gathered (largely) from the following papers:

Also, the following analyses (conducted by members of the WDQS team) examined more recent data:

Testing Wikidata-Specific Updates and Queries

Requirement to address a wide variety of SPARQL language constructs (such as FILTER, OPTIONAL, GROUP BY, ...), and query and update patterns. Testing will include federated and geospatial queries, and support for the (evolution of the) label, GAS and MediaWiki services.

Tests will be defined to exercise:

  • SELECT, ASK and CONSTRUCT queries, as well as INSERT and DELETE updates
  • SPARQL language constructs
    • Solution modifiers - Distinct, Limit, Offset, Order By
    • Algebraic operators - Filter, Union, Optional, Exists, Not Exists, Minus
    • Aggregation operators - Count, Min/Max, Avg, Sum, Group By, Having
  • With combinations of the above language constructs
    • xxx
  • With varying numbers of triples (from 1 to 50+)
  • Utilizing different property path lengths and structures
    • xxx
  • Using different graph patterns
    • xxx

The tests will be defined using query templates with varying entity selections to avoid pre-defined, static queries (that are known in advance and for which a platform can be tuned). They will be executed in batches and the following statistics collected:

  • Execution time (longest, shortest, average) or time out
  • Correctness and completeness of response/update

Workload Testing

This evaluation will utilize combinations of the above queries/updates (TBD), characterized by the actual workloads captured on the WDQS queries and Streaming Updater dashboards. These workloads reflect both user and bot queries.

For each test iteration, the following will be reported:

  • Total execution time
  • Mean and geometric mean across the individual queries
  • Number of queries that executed and completed, and their times
  • Number of queries that timed out
  • Number of results for queries that completed

Test Infrastructure

TBD ... The test infrastructure will likely utilize one or more of the existing frameworks or tools.

Background on SPARQL Benchmarks

See Background on SPARQL Benchmarks.