You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

User:AndreaWest/WDQS Testing/Running TFT

From Wikitech-static
< User:AndreaWest‎ | WDQS Testing
Revision as of 00:28, 27 June 2022 by imported>AndreaWest
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In order to execute BorderCloud's Tests for Triplestore (TFT) codebase on a local installation of a database (and without docker and jmeter), changes were made to the code and test definitions. This page explains the changes, as well as providing references to all the backing code. Also included are the steps to execute the tests, using a Stardog DB for the example, and details on how to extend them.

Testing Overview

The TFT infrastructure was forked from the "master" branch (not the default, "withJMeter" branch) of the BorderCloud repository. The tests were also forked from BorderCloud, from the rdf-tests repository. These tests are the ones defined by the W3C and were originally forked from the W3C RDF Test repository. The new repositories are the:

Minor changes were made to the RDF test definitions. Specifically, the manifest*.ttls in the sub-directories of rdf-tests/sparql11/data-sparql11 were updated. Those files make reference to SPARQL query, TTL/RDF and other text files (used as inputs and outputs to validate test results) using an IRI declaration (left and right carets) that only specify a file name with no explicit namespace (but a default namespace is defined in the Turtle file).

Since the IRI is simply a file name (with no authority such as http:, file:, etc.), some data stores may have unpredictable behavior when handling the references. For this reason, the triples in the test definitions were updated to change the format from (for example) "<some_test_iri> qt:query <query_for_test.rq>" to: "<some_test_iri> qt:query :query_for_test.rq", to explicitly use the default namespace specified in the Turtle.

The code behind these changes can be found in the FixTTL Jupyter notebook in the updated RDF tests repository. Note that the original files are present in each directory, named manifest*.ttl.bak.

As regards the GeoSPARQL tests, the BorderCloud tests were not used since they were not complete. Instead, the tests from the GeoSPARQL Benchmark repository were utilized. That repository was forked to create the repo noted above. The test data and a subset of the test definitions are included, and are defined using the TFT format. The specific GeoSPARQL tests that are included are specified in the README.md of the repository and are shown when the GitHub page is accessed.

To move the tests from the original repository's test infrastructure to TFT required defining:

  • A manifest-all.ttl to indicate the GeoSPARQL compliance areas (Core, Topology Vocabulary, Geometry Extension and Geometry Topology Extension) being evaluated
  • A sub-directory for each compliance area, to hold a manifest.ttl file and the test inputs (.rdf files), queries (.rq files) and results (.srx files)
    • Within each directory/GeoSPARQL compliance area, a manifest.ttl file was created to define the tests' details
      • Each test was declared to be dawgt:Approved because this was required by the TFT infrastructure to avoid problems with RDF tests that were defined as "Proposed"
      • Note that each test loads the same data file (dataset.rdf). Loading the RDF could have been done in advance of the testing, but that would have required additional changes to the TFT code.
    • The names of any alternative result files that did not include the text, '-alternative-', were modified to do so
      • Alternative result files are explained in the paper, A GeoSPARQL Compliance Benchmark, in section 3.4.3
      • For example, testing of GeoSPARQL Requirement 9 was defined as using:
        • query-r09-4.rq, the query
        • query-r09-4.srx, the first alternative result file (which was renamed to query-r09-4-alternative-2.srx)
        • query-r09-4-alternative-1.srx, the second alternative results file
      • When alternative outputs were possible, the manifest.ttl defined the "result" triple using the format, query-r##(-<optional#>)-alternative-<number_of_srx_files>.srx
        • For GeoSPARQL Requirement 9, that meant that the following triple was declared: "req09-4 mf:result query-r09-4-alternative-2.srx"
        • The renaming enabled easier result processing in Test.php, which is discussed in more detail in the Code Modifications section below
    • Note that other than the name changes above, the .rq and .srx files are unmodified from the original repository

Incorporating the Tests Using Git Submodules

Both the BorderCloud and updated TFT repositories incorporate tests using git submodules. Therefore, if the tests are updated in either the RDF or GeoSPARQL repositories, the changes have to be incorporated/merged into the TFT repository. This is accomplished by the following instructions:

cd mysubmoduledir
git fetch
git checkout master
git merge origin/master
cd TFTtopleveldir
git status    # should show changes to the mysubmoduledir
git add mysubmoduledir
git commit -m "Updated submodule"
git push

Code Modifications

The TFT codebase was modified to not require external databases or Docker, and to allow tests to be pulled from a local file server (for example, a directory published as a simple HTTP server) or from a different test repository. The goal was to make minimal changes to the infrastructure.

The following files were updated and are available in the AndreaWesterinen/TFT repository. This is the directory that is cloned in the instructions below.

  • config.ini
    • Updated to test "standard" SPARQL 1.1, to reference the correct repository and local path for tests, to add a new listTestSuite entry (with the W3C SPARQL test location), and to reference the location of the databases to be used in SERVICE queries
    • The original entries from the file are commented out using a beginning semi-colon (";")
    • Note that without the new listTestSuite entry, when running php ./tft, many of the tests were unable to locate the appropriate input/output files
      • Although not elegant, this was the fastest and easiest solution to the problem
  • AbstractTest.php, Test.php and Tools.php
    • Where RDF test data files were specified in manifest*.ttl and referenced as IRIs with the default namespace, the reference to "manifest#" needed to be removed
    • (For Test.php) Requests to the SERVICE endpoints to load data required the addition of "update" to the SPARQL endpoint addresses
      • These changes were made to the clearAllTriples() and importGraphInput() functions
      • There was no CLI option for php ./tft to specify different update and query endpoints, as was possible for the test suite and test databases (otherwise, that approach would have been taken)
    • (For Test.php) Test evaluation required checking multiple "alternative" result files
      • Changes were made to the checkResult() function
      • The processing involved checking if the text, "-alternative-", occurred in the file name, and if so, cycling through the possible result files (starting with -alternative-1)
      • If the triples in the tested database matched the contents of any of the alternative output files, then the test was deemed successful and no more result files were checked
      • To account for -verbose and -debug output, details of each of the result comparisons are captured in the test "message"
  • tft and tft-testsuite
    • Clarified the 'usage' text and error messages

Executing the Tests

The following execution example uses a local copy of the Stardog server (which was already installed on my laptop) to test the changes and process.

  • Start the triple store with security disabled
    • With security enabled, accessing the SERVICE endpoints resulted in permission errors. The php ./tft code does not allow the specification of the SERVICE endpoints' user names and passwords (as it does for the test details and tested databases). In lieu of addressing this problem, the shortcut of disabling security was taken.
    • Using the command below, Stardog is accessible as localhost at port 5820
stardog-admin server start --bind 127.0.0.1 --disable-security
  • Set up the necessary data stores in the triple store
    • The example* stores represent databases accessed as SERVICEs
    • The tft-tests database holds the test details and results
    • The tft-stardog data store is the database being tested
      • tft-stardog needs to be initialized with the configuration parameters, spatial.enabled and spatial.use.jts, set to "true" in order that the geospatial features are correctly loaded
stardog-admin db create -n example
stardog-admin db create -n example1
stardog-admin db create -n example2
stardog-admin db create -n tft-tests
stardog-admin db create -n tft-stardog -o spatial.enabled=true spatial.use.jts=true --
  • Get the TFT codebase and RDF tests
git clone --recursive https://github.com/AndreaWesterinen/TFT
  • Move to the TFT directory just created
cd TFT
  • Install the BorderCloud SPARQL client (which requires composer)
composer install
  • Load the tests into the tft-tests data store
php ./tft-testsuite -a -q 'http://localhost:5820/tft-tests/query' -u 'http://localhost:5820/tft-tests/update'
  • If everything is running correctly, you should see output similar to:
Configuration about tests :
- Endpoint type        : standard
- Endpoint query       : http://localhost:5820/tft-tests/query
- Endpoint update      : http://localhost:5820/tft-tests/update
- Mode install all     : ON
- Test suite : URL     :
- Test suite : folder  :
- Mode verbose         : OFF
- Mode debug           : OFF
============ CLEAN GRAPH <https://bordercloud.github.io/rdf-tests/sparql11/data-sparql11/>
Before to clean : 0 triples
After to clean : 0 triples
=================================================================
Start to init the dataset via URL
......................................
38 new graphs
============ CLEAN GRAPH <https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/>
Before to clean : 7177 triples
After to clean : 7177 triples
=================================================================
Start to init the dataset via URL
......
6 new graphs
  • Execute the tests (note the definition of the tested software name, tag and description)
php ./tft -q 'http://localhost:5820/tft-tests/query' -u 'http://localhost:5820/tft-tests/update' -tq http://localhost:5820/tft-stardog/query -tu http://localhost:5820/tft-stardog/update -o ./junit -r urn:results --softwareName="Stardog" --softwareDescribeTag=v7.9.1 --softwareDescribe=7.9.1-test
  • You should see output similar to what is listed directly below. There are a few items to note:
    • The results use the convention, '.' for success, 'F' for failure, 'E' for some error, 'S' for skipped
    • The large number of tests marked as "skipped" in the QueryEvaluationTest are caused by TFT infrastructure errors related to entailment. These tests are not currently relevant to Wikidata and will not present a problem.
    • All the GeoSPARQL tests are defined as QueryEvaluationTests
    • The tests that reference "http://www.w3.org/2009/sparql/docs/tests/data-sparql11/" (in the latter part of the output) are an artifact of the config.ini file, as noted in the section above. The last set of test results (labelled as "TEST : http://www.w3.org/2009/sparql/docs/tests/data-sparql11/") can be ignored.
Configuration about tests :
- Graph of output EARL : urn:results
- Output of tests      : ./junit
- Endpoint type        : standard
- Endpoint query       : http://localhost:5820/tft-tests/query
- Endpoint update      : http://localhost:5820/tft-tests/update
- TEST : Endpoint type        : standard
- TEST : Endpoint query       : http://localhost:5820/tft-stardog/query
- TEST : Endpoint update      : http://localhost:5820/tft-stardog/update
- Mode verbose         : OFF
- Mode debug           : OFF
==================================================================
TEST : https://andreawesterinen.github.io/rdf-tests/sparql11/data-sparql11/

		TESTS : ProtocolTest
.Nb tests : 0

--------------------------------------------------------------------
TESTS : PositiveSyntaxTest
.Nb tests : 63
F.................................F.FF.........................

--------------------------------------------------------------------
TESTS : NegativeSyntaxTest
.Nb tests : 43
...........................................

--------------------------------------------------------------------
TESTS : QueryEvaluationTest.Nb tests : 252
...........................................................................................FESESESSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....F.......F.......................................................................................................................F.F.................................................................................F.F....ESESESESESES.....F.F......................

		TESTS : CSVResultFormatTest
.Nb tests : 3
ESESES
		TESTS : UpdateEvaluationTest
.Nb tests : 93
.........................................................................F...................
		TESTS : PositiveUpdateSyntaxTest
.Nb tests : 42
.........F..........F..................F..
		TESTS : NegativeUpdateSyntaxTest
.Nb tests : 13
.........FF.F
 END TESTS
==================================================================
TEST : https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/

		TESTS : ProtocolTest
.Nb tests : 0

--------------------------------------------------------------------
TESTS : PositiveSyntaxTest
.Nb tests : 0


--------------------------------------------------------------------
TESTS : NegativeSyntaxTest
.Nb tests : 0


--------------------------------------------------------------------
TESTS : QueryEvaluationTest.Nb tests : 51
.......................F..ESESESES..ESESESESES.FESESESESESESESESESESES.F.F.F.F.F.F.F.F................

		TESTS : CSVResultFormatTest
.Nb tests : 0

		TESTS : UpdateEvaluationTest
.Nb tests : 0

		TESTS : PositiveUpdateSyntaxTest
.Nb tests : 0

		TESTS : NegativeUpdateSyntaxTest
.Nb tests : 0

 END TESTS
==================================================================
TEST : http://www.w3.org/2009/sparql/docs/tests/data-sparql11/

		TESTS : ProtocolTest
.Nb tests : 0

--------------------------------------------------------------------
TESTS : PositiveSyntaxTest
.Nb tests : 0


--------------------------------------------------------------------
TESTS : NegativeSyntaxTest
.Nb tests : 0


--------------------------------------------------------------------
TESTS : QueryEvaluationTest.Nb tests : 0


		TESTS : CSVResultFormatTest
.Nb tests : 0

		TESTS : UpdateEvaluationTest
.Nb tests : 0

		TESTS : PositiveUpdateSyntaxTest
.Nb tests : 0

		TESTS : NegativeUpdateSyntaxTest
.Nb tests : 0

 END TESTS
  • To determine the final results, execute the query below
    • Note that these tests do NOT use the tft-score code
    • Also, note that the graph name is the one specified with the -r option in the php ./tft instruction above
stardog query execute tft-tests "prefix earl: <http://www.w3.org/ns/earl#>
SELECT ?out (COUNT(DISTINCT ?assertion) AS ?cnt)
WHERE
{
        GRAPH <urn:results> {
                ?assertion a earl:Assertion.
                ?assertion earl:test ?test.
                ?assertion earl:result ?result.
                ?result earl:outcome ?out .
        }
} GROUP BY ?out"
  • Results will be reported as shown:
+------------------------------------+-------+
|                out                 |  cnt  |
+------------------------------------+-------+
| http://www.w3.org/ns/earl#passed   | 733   |
| http://www.w3.org/ns/earl#failed   | 30    |
| http://www.w3.org/ns/earl#error    | 32    |
| http://www.w3.org/ns/earl#untested | 172   |
+------------------------------------+-------+

Query returned 4 results in 00:00:00.131
  • To see the tests which failed, execute this query:
stardog query tft-tests "prefix earl: <http://www.w3.org/ns/earl#>
select distinct ?s where {
        GRAPH <urn:results> { {?s earl:outcome earl:failed} 
                              UNION {?s earl:outcome earl:error} }
}"
+----------------------------------------------------------------------------------+
|                                        s                                         |
+----------------------------------------------------------------------------------+
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-fed/manifest#test_ |
| 1/Syntax/2022-05-29T02:46:09+00:00                                               |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-query/manifest#tes |
| t_4/Syntax/2022-05-29T02:46:09+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-query/manifest#tes |
| t_41/Syntax/2022-05-29T02:46:09+00:00                                            |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-query/manifest#tes |
| t_42/Syntax/2022-05-29T02:46:09+00:00                                            |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/construct/manifest#constr |
| uctwhere04/Response/2022-05-29T02:46:09+00:00                                    |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#tsv0 |
| 1/Protocol/2022-05-29T02:46:09+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#tsv0 |
| 2/Protocol/2022-05-29T02:46:09+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#tsv0 |
| 3/Protocol/2022-05-29T02:46:09+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/exists/manifest#exists03/ |
| Response/2022-05-29T02:46:09+00:00                                               |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/functions/manifest#bnode0 |
| 1/Response/2022-05-29T02:46:09+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/json-res/manifest#jsonres |
| 01/Response/2022-05-29T02:46:09+00:00                                            |
. . .
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 1/Response/2022-05-29T02:46:09+00:00                                             |
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 2/Response/2022-05-29T02:46:09+00:00                                             |
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 3/Response/2022-05-29T02:46:09+00:00                                             |
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 4/Response/2022-05-29T02:46:09+00:00                                             |
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 5/Response/2022-05-29T02:46:09+00:00                                             |
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 6/Response/2022-05-29T02:46:09+00:00                                             |
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 7/Response/2022-05-29T02:46:09+00:00                                             |
| https://andreawesterinen.github.io/GeoSPARQLBenchmark-Tests/geosparql/qrw/req28- |
| 8/Response/2022-05-29T02:46:09+00:00                                            |
+----------------------------------------------------------------------------------+

Query returned 62 results in 00:00:00.137

Getting More Information Using Verbose Mode

If you are experiencing errors when loading the test suites or running the tests, use the -v and -d flags when executing the php ./tft-testsuite and/or php ./tft programs.

How to Change or Extend the Tests

Any of the repositories (TFT, rdf-tests or GeoSPARQLBenchmark-Tests) can be updated via a pull request or by forking. If either the RDF or GeoSPARQL test repositories are forked, it is recommended that the TFT repo also be forked and its submodule links reset/redefined. An example is shown in the commit history of the TFT repository.

To add a new set of tests for either RDF or GeoSPARQL, begin by updating the manifest-all.ttl file in the rdf-tests/sparql11/data-sparql11 or geosparql-tests/geosparql directory. The manifest-all file identifies which component directories (with their manifest.tll files) should be included.

To add a test or to modify any of the existing tests, create or edit the manifest.ttl file in the appropriate subdirectory of rdf-tests/sparql11/data-sparql11 or geosparql-tests/geosparql. The manifest.ttl files contain the information encoded using the conventions defined in the SPARQL 1.1 Test Case Structure document. Note that for GeoSPARQL, all of the tests were defined as QueryEvaluationTests. This may or may not be true for new tests.

When creating a manifest.ttl file, remember to update the default namespace defined in the prefixes.

In addition, an entirely new repository of tests can be added, similar to the approach taken for adding the GeoSPARQL tests (discussed in the Testing Overview section above. If this is done, it is again recommended that the TFT repository be forked and its submodule links reset/redefined.

When defining a new repository, a few additional changes are needed:

  • The config.ini file in the TFT repo should be updated to include the new repository as a new listTestSuite entry
  • Access to the manifest and test files should be provided using an approach like GitHub "Pages". This is because:
    • The TFT infrastructure and accompanying triple stores require a valid IRI/address to perform SPARQL LOADs of data
    • Doing so via a "file://" authority is problematic since the query is processed and dereferenced at the database server. Even if TFT and the database are running on the same machine, they are not necessarily running as the same user with the same permissions, which would cause problems. It is much easier to just access the files using an "https://" authority (e.g., from a GitHub Pages site).
    • For an example, please see the index.html file in GeoSPARQLBenchmark-Tests