Jump to content

This is a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Data Services

From Wikitech
WMCS data services

Data Services includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores.

Data services currently include: Wiki Replicas, Wikimedia Dumps, Shared Storage, CirrusSearch Elasticsearch replicas, Quarry and PAWS.

Data stores

Wiki Replicas are MySQL/MariaDB databases that replicate near-realtime from the production MediaWiki databases of Wikimedia Foundation wikis.

Wikimedia Dumps offers a range of data downloads including full text dumps, and other datasets.

Shared Storage is offered via NFS. It includes shared directories offered to VPS and Toolforge users. Wikimedia Dumps are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.

The " Cloud Elastic " servers are a replica of the CirrusSearch OpenSearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge).

Wikimedia Enterprise is a set of API's targeting large scale user needs. These APIs are maintained and developed by the Commercial Partnerships division. Users of Toolforge, Cloud VPS, or PAWS have access to the On-demand and Snapshot APIs.

Web interfaces

Quarry and PAWS require a Wikimedia SUL account to login.

Quarry is a graphical web interface that allows users to query Wiki Replicas and ToolsDB using SQL.

PAWS is a Jupyter notebooks installation hosted by Wikimedia Cloud Services that hosts Python notebooks and a terminal accessible through a web browser. You can access Wiki Replicas, ToolsDB and Dumps with PAWS.

See also