Jump to content

This is a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Data Services

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
WMCS data services

Data Services includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores.

Data services currently include: Wiki Replicas, Wikimedia Dumps, Shared Storage, CirrusSearch Elasticsearch replicas, Wikimedia Enterprise, Quarry, and PAWS.

Data stores

Wiki Replicas are MySQL/MariaDB databases that replicate near-realtime from the production MediaWiki databases of Wikimedia Foundation wikis.

Wikimedia Dumps offers a range of data downloads including full text dumps, and other datasets.

Shared storage is offered via NFS . It includes shared directories offered to VPS and Toolforge users. Wikimedia Dumps are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.

The " Cloud Elastic " servers are a replica of the CirrusSearch OpenSearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge).

Wikimedia Enterprise APIs give high-volume and high query rate access to Wikimedia project data. Users of Toolforge , Cloud VPS , and PAWS can call any of the endpoints described in the Wikimedia Enterprise documentation without passing an authorization header .

Web interfaces

Quarry and PAWS require a Wikimedia SUL account to login.

Quarry is a graphical web interface that allows users to query Wiki Replicas and ToolsDB using SQL.

PAWS is a Jupyter notebooks installation hosted by Wikimedia Cloud Services that hosts Python notebooks and a terminal accessible through a web browser. You can access Wiki Replicas, ToolsDB and Dumps with PAWS.

See also