Jump to content

This is a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Dumps/CategoriesRDF

From Wikitech

This doc is about the dumps of categories in rdf format. For information about the xml/sql dumps including the category-related tables, please see Dumps/XML-SQL Dumps .

Issues with these dumps should be reported in Phabricator under the Dumps-generation project, as well as the Wikidata-Query-Service project. WDQS ingests these dumps, which is why they are produced.

These dumps are run out of cron.

  • Weekly runs: generate full lists of categories on each public Wikimedia project in rdf format
  • Daily runs: generate sparql-format lists of queries to run which move, delete and insert categories that have changed since the previous day

These dumps run on database servers designated 'vslow, dumps', on a snapshot host dedicated to 'misc' dump generation (everything other than the xml/sql dumps).

The dump scripts are in our git puppet repo .

The daily runs take about 15 minutes to complete, as of early 2019. The weekly runs take about 2.5 hours to complete.