You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Data Services: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Arturo Borrero Gonzalez
(mention OSM database)
imported>Bstorm
(→‎Wikilabels Postgres: update location)
Line 5: Line 5:


== Wiki Replicas ==
== Wiki Replicas ==
Wiki Replicas are the sanitized public replicas of the production Wikimedia MediaWiki wiki databases. Access to the Wiki Replicas is granted for users with a Toolforge account automatically. See [[Help:Toolforge/Database]] for how to access the [[Wiki replicas|Wiki Replicas]].
Wiki Replicas are MySQL/MariaDB databases that near-realtime replicate from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.
 
Access to the Wiki Replicas is automatically granted to all users of Toolforge. See [[Help:Toolforge/Database]] for how to access the [[Wiki replicas|Wiki Replicas]].


== ToolsDB ==
== ToolsDB ==
ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database. See [[Help:Toolforge/Database#User databases]] for help on ToolsDB.
ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database. See [[Help:Toolforge/Database#User databases]] for help on ToolsDB.
It's acessible on the following addresses:
* tools.db.svc.eqiad.wmflabs (preferred)
* tools-db.tools.eqiad.wmflabs
It used to run on [https://netbox.wikimedia.org/dcim/devices/1915/ labsdb1005] and got migrated into a Cloud VPS VM called [https://tools.wmflabs.org/openstack-browser/server/clouddb1001.clouddb-services.eqiad.wmflabs clouddb1001] in the [https://tools.wmflabs.org/openstack-browser/project/clouddb-services clouddb-services] project (more details about the migration are available in [[phab:T208754]] [[phab:T193264]]).
You can verify the [https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=checker.tools.wmflabs.org&service=toolschecker%3A+toolsdb service status] and the [https://icinga.wikimedia.org/cgi-bin/icinga/avail.cgi?host=checker.tools.wmflabs.org&service=toolschecker%3A+toolsdb&show_log_entries availability report] in Icinga. Active checks are carried out by [[Help:Toolforge/Monitoring#Toolschecker|Toolschecker]] upon request by Icinga.
== Wikilabels Postgres ==
The Postgres database used by [[Wikilabels]] (used by Ores) is on the a replicated cluster of VMs, [https://tools.wmflabs.org/openstack-browser/server/clouddb-wikilabels-01.clouddb-services.eqiad.wmflabs clouddb-wikilabels-01] is the primary with [https://tools.wmflabs.org/openstack-browser/server/clouddb-wikilabels-02.clouddb-services.eqiad.wmflabs clouddb-wikilabels-02] as the usual replica. Changes that affect the postgresql service, including upgrades/reboots, should be coordinated with Aaron Halfaker.


== Wikimedia Dumps ==
== Wikimedia Dumps ==
Line 19: Line 32:


== Quarry ==
== Quarry ==
[https://quarry.wmflabs.org/ Quarry] is a graphical web interface that allows users to write SQL to query the Wiki Replicas. It only needs a Wikimedia (Meta) account to login, and is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See [[m:Research:Quarry]] for help.
[https://quarry.wmflabs.org/ Quarry] is a graphical web interface that allows users to query the Wiki Replicas with SQL. It only needs a Wikipedia or Meta-Wiki account to login, and is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See [[m:Research:Quarry]] for help.


== PAWS ==
== PAWS ==
Line 26: Line 39:
== OSM Database ==
== OSM Database ==


We provide a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See [[Help:Toolforge/Database#Connecting_to_OSM_via_the_official_CLI_PostgreSQL]] and [[Openstreetmap_Databases]] for more information.
We provide a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See [[Help:Toolforge/Database#Connecting to OSM via the official CLI PostgreSQL]] and [[Openstreetmap Databases]] for more information.


== See also ==
== See also ==

Revision as of 23:10, 24 May 2019

WMCS data services

Data Services include services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores. Services currently offered are: Wiki Replicas, ToolsDB, Wikimedia Dumps, Shared Storage, Quarry and PAWS.

Wiki Replicas

Wiki Replicas are MySQL/MariaDB databases that near-realtime replicate from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.

Access to the Wiki Replicas is automatically granted to all users of Toolforge. See Help:Toolforge/Database for how to access the Wiki Replicas.

ToolsDB

ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database. See Help:Toolforge/Database#User databases for help on ToolsDB.

It's acessible on the following addresses:

  • tools.db.svc.eqiad.wmflabs (preferred)
  • tools-db.tools.eqiad.wmflabs

It used to run on labsdb1005 and got migrated into a Cloud VPS VM called clouddb1001 in the clouddb-services project (more details about the migration are available in phab:T208754 phab:T193264).

You can verify the service status and the availability report in Icinga. Active checks are carried out by Toolschecker upon request by Icinga.

Wikilabels Postgres

The Postgres database used by Wikilabels (used by Ores) is on the a replicated cluster of VMs, clouddb-wikilabels-01 is the primary with clouddb-wikilabels-02 as the usual replica. Changes that affect the postgresql service, including upgrades/reboots, should be coordinated with Aaron Halfaker.

Wikimedia Dumps

Wikimedia Dumps offers a range of data downloads including full text dumps, and other datasets. Toolforge users can directly access dumps data through their Tool account, see Help:Toolforge#Dumps. Cloud VPS users can request to have the share available, see Help:Shared storage#/public/dumps.

Shared Storage

Shared Storage is offered via NFS for Toolforge and Cloud VPS users. Shares currently offered are described at Help:Shared storage. The Toolforge environment is setup for access by default, and other Cloud VPS projects can access some resources on special request.

Wikimedia Dumps are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.

Quarry

Quarry is a graphical web interface that allows users to query the Wiki Replicas with SQL. It only needs a Wikipedia or Meta-Wiki account to login, and is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See m:Research:Quarry for help.

PAWS

PAWS is a Jupyter notebooks on the cloud service that hosts python notebooks and a terminal accessible through a web browser. It also only requires a Wikimedia (Meta) account to login, and allows for access to the Wiki Replicas, ToolsDB and Dumps. See PAWS for help.

OSM Database

We provide a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See Help:Toolforge/Database#Connecting to OSM via the official CLI PostgreSQL and Openstreetmap Databases for more information.

See also