You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Dumps: Difference between revisions
Jump to navigation
Jump to search
imported>GTirloni (Update clushmaster) |
imported>ArielGlenn (gonna get these docs to suck less if it kills me) |
||
Line 1: | Line 1: | ||
These docs are for developers and maintainers of the various dumps. Information for users of the dumps can be found on [[meta:Data dumps|meta]]. | |||
=== Dumps types === | |||
We produce several types of dumps. For information about deployment of updates, architecture of the dumps, and troubleshooting each dump type, check the appropriate entry below. | |||
* [[Dumps/XML-SQL Dumps|xml/sql dumps]] which contain '''revision metadata and content''' for public Wikimedia projects, along with contents of select '''sql tables''' | |||
* [[Dumps/Adds-changes_dumps|adds/changes dumps]] which contain a '''daily xml dump of new pages''' or pages with '''new revisions''' since the previous run, for public Wikimedia projects | |||
* [[Dumps/WikidataDumps|Wikidata entity dumps]] which contain dumps of ''' 'entities' (Qxxx)''' in various formats, and a dump of '''lexemes''', run once a week. | |||
* [[Dumps/CategoriesRDF|category dumps]] which contain weekly full and daily incremental '''category lists''' for public Wikimedia projects, in '''rdf format''' | |||
* [[Dumps/OtherMisc|other miscellaneous dumps]] including '''content translation''' dumps, '''cirrus search''' dumps, and '''global block''' information. | |||
Other datasets are also provided for download, such as page view counts; these datasets are managed by other folks and are not documented here. | |||
=== Hardware === | |||
* [[Dumps/Snapshot hosts | Dumps snapshot hosts]] that run scripts to generate the dumps | |||
* [[Dumps/Dumpsdata hosts | Dumps datastores]] where the snapshot hosts write intermediate and final dump output files, which are later published to our web servers | |||
* [[Dumps/Dump servers | Dumps servers]] that provide the dumps to the public, to our mirrors, and via nfs to Wikimedia Cloud Services and stats host users | |||
=== Adding new dumps === | |||
If you are interested in adding a new dumpset, please check the [[Dumps/New dumps and datasets|guidelines]] (still in draft form). | |||
=== | === Mirrors === | ||
If you are adding a mirror, see [[Dumps/Mirror status | Dumps Mirror setup ]]. | |||
[[Category: Dumps]] | |||
[[Category: | |||
Revision as of 14:01, 14 March 2019
These docs are for developers and maintainers of the various dumps. Information for users of the dumps can be found on meta.
Dumps types
We produce several types of dumps. For information about deployment of updates, architecture of the dumps, and troubleshooting each dump type, check the appropriate entry below.
- xml/sql dumps which contain revision metadata and content for public Wikimedia projects, along with contents of select sql tables
- adds/changes dumps which contain a daily xml dump of new pages or pages with new revisions since the previous run, for public Wikimedia projects
- Wikidata entity dumps which contain dumps of 'entities' (Qxxx) in various formats, and a dump of lexemes, run once a week.
- category dumps which contain weekly full and daily incremental category lists for public Wikimedia projects, in rdf format
- other miscellaneous dumps including content translation dumps, cirrus search dumps, and global block information.
Other datasets are also provided for download, such as page view counts; these datasets are managed by other folks and are not documented here.
Hardware
- Dumps snapshot hosts that run scripts to generate the dumps
- Dumps datastores where the snapshot hosts write intermediate and final dump output files, which are later published to our web servers
- Dumps servers that provide the dumps to the public, to our mirrors, and via nfs to Wikimedia Cloud Services and stats host users
Adding new dumps
If you are interested in adding a new dumpset, please check the guidelines (still in draft form).
Mirrors
If you are adding a mirror, see Dumps Mirror setup .