You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Portal:Data Services: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>BryanDavis
m (Reverted edits by a hidden user to last revision by Jforrester)
imported>Razzi
m (→‎ToolsDB: Fix broken icinga links)
(11 intermediate revisions by 8 users not shown)
Line 2: Line 2:
[[File:WMCS data services.svg|right|120px|alt=WMCS data services]]
[[File:WMCS data services.svg|right|120px|alt=WMCS data services]]


'''Data Services''' include services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores. Services currently offered are: Wiki Replicas, ToolsDB, Wikimedia Dumps, Shared Storage, Quarry and PAWS.
'''Data Services''' includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores.  
 
Data services currently include: Wiki Replicas, ToolsDB, Wikilabels Postgres, Wikimedia Dumps, Shared Storage, CirrusSearch Elasticsearch replicas, Quarry, PAWS, and the OSM Database.


== Wiki Replicas ==
== Wiki Replicas ==
Wiki Replicas are MySQL/MariaDB databases that near-realtime replicate from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.


Access to the Wiki Replicas is automatically granted to all users of Toolforge. See [[Help:Toolforge/Database]] for how to access the [[Wiki replicas|Wiki Replicas]].
''' About '''
 
Wiki Replicas are MySQL/MariaDB databases that replicate near-realtime from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.
 
''' How to access '''
 
Access to the Wiki Replicas is automatically granted to all users of Toolforge. See [[Help:Toolforge/Database]] to learn how to access the [[Wiki replicas|Wiki Replicas]].


== ToolsDB ==
== ToolsDB ==
ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database. See [[Help:Toolforge/Database#User databases]] for help on ToolsDB.


It's acessible on the following addresses:
''' About '''
* tools.db.svc.eqiad.wmflabs (preferred)
* tools-db.tools.eqiad.wmflabs


It used to run on [https://netbox.wikimedia.org/dcim/devices/1915/ labsdb1005] and got migrated into a Cloud VPS VM called [https://tools.wmflabs.org/openstack-browser/server/clouddb1001.clouddb-services.eqiad.wmflabs clouddb1001] in the [https://tools.wmflabs.org/openstack-browser/project/clouddb-services clouddb-services] project (more details about the migration are available in [[phab:T208754]] [[phab:T193264]]).
ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database.  


You can verify the [https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=checker.tools.wmflabs.org&service=toolschecker%3A+toolsdb service status] and the [https://icinga.wikimedia.org/cgi-bin/icinga/avail.cgi?host=checker.tools.wmflabs.org&service=toolschecker%3A+toolsdb&show_log_entries availability report] in Icinga. Active checks are carried out by [[Help:Toolforge/Monitoring#Toolschecker|Toolschecker]] upon request by Icinga.
See [[Help:Toolforge/Database#User databases]] for help on ToolsDB.
 
''' How to access '''
 
ToolsDB is acessible on the following addresses:
* tools.db.svc.eqiad1.wikimedia.cloud (preferred)
* tools-db.tools.eqiad1.wikimedia.cloud
 
It used to run on [https://netbox.wikimedia.org/dcim/devices/1915/ labsdb1005] and got migrated into a Cloud VPS VM called [https://openstack-browser.toolforge.org/server/clouddb1001.clouddb-services.eqiad1.wikimedia.cloud clouddb1001] in the [https://openstack-browser.toolforge.org/project/clouddb-services clouddb-services] project (more details about the migration are available in [[phab:T208754]] [[phab:T193264]]).
 
You can verify the [https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=checker.tools.wmflabs.org&service=toolschecker%3A+toolsdb+read%2Fwrite service status] and the [https://icinga.wikimedia.org/cgi-bin/icinga/avail.cgi?host=checker.tools.wmflabs.org&service=toolschecker%3A+toolsdb+read%2Fwrite&show_log_entries availability report] in Icinga. Active checks are carried out by [[Help:Toolforge/Monitoring#Toolschecker|Toolschecker]] upon request by Icinga.


== Wikilabels Postgres ==
== Wikilabels Postgres ==
The Postgres database used by [[Wikilabels]] (used by Ores) is on a replicated VM cluster: [https://tools.wmflabs.org/openstack-browser/server/clouddb-wikilabels-01.clouddb-services.eqiad.wmflabs clouddb-wikilabels-01] is the primary with [https://tools.wmflabs.org/openstack-browser/server/clouddb-wikilabels-02.clouddb-services.eqiad.wmflabs clouddb-wikilabels-02] as the usual replica. Changes that affect the postgresql service, including upgrades/reboots, should be coordinated with Aaron Halfaker.
 
''' About '''
 
The [[Wikilabels]] Postgres database, used by [[mw:ORES|ORES]], is on a replicated VM cluster: [https://openstack-browser.toolforge.org/server/clouddb-wikilabels-01.clouddb-services.eqiad1.wikimedia.cloud clouddb-wikilabels-01] is the primary with [https://openstack-browser.toolforge.org/server/clouddb-wikilabels-02.clouddb-services.eqiad1.wikimedia.cloud clouddb-wikilabels-02] as the usual replica.


== Wikimedia Dumps ==
== Wikimedia Dumps ==
[https://dumps.wikimedia.org/ Wikimedia Dumps] offers a range of data downloads including full text dumps, and other datasets. Toolforge users can directly access dumps data through their Tool account, see [[Help:Toolforge/Dumps]]. Cloud VPS users can request to have the share available, see [[Help:Shared storage#.2Fpublic.2Fdumps|Help:Shared storage#/public/dumps]]. More documentation about dumps can be found at https://meta.wikimedia.org/wiki/Data_dumps
 
''' About '''
 
[https://dumps.wikimedia.org/ Wikimedia Dumps] offers a range of data downloads including full text dumps, and other datasets. More documentation about dumps can be found at [[m:Data dumps|Data dumps]].
 
''' How to access '''
 
* Toolforge users can directly access dumps data through their Tool account. See [[Help:Toolforge/Dumps]].  
* Cloud VPS users can request to have the share available.See [[Help:Shared storage#.2Fpublic.2Fdumps|Help:Shared storage#/public/dumps]].  


== Shared Storage ==
== Shared Storage ==
Shared Storage is offered via [[w:Network_File_System|NFS]] for Toolforge and Cloud VPS users. Shares currently offered are described at [[Help:Shared storage]]. The Toolforge environment is setup for access by default, and other Cloud VPS projects can access some resources on special request.


[https://dumps.wikimedia.org/ Wikimedia Dumps] are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.
''' About '''
 
Shared Storage is offered via [[w:Network_File_System|NFS]]. It includes shared directories offered to VPS and ToolForge users. Currently offered shares are described at [[Help:Shared storage]]. [https://dumps.wikimedia.org/ Wikimedia Dumps] are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.
 
''' How to access '''
 
The Toolforge environment is set up for access by default, and other Cloud VPS projects can access some resources by requesting access to listed shares by filing a task on Phabricator under the Data-Services and VPS-Projects projects.
 
== CirrusSearch Elasticsearch replicas ==
 
''' About '''
 
The "Cloud Elastic" servers are a replica of the [[mw:Extension:CirrusSearch|CirrusSearch]] Elasticsearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge). Applications can use the full power of the [https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl.html elasticsearch search API's] to query the search indices in ways that CirrusSearch does not expose directly on the wikis themselves. See [[Help:CirrusSearch elasticsearch replicas]] for more details.
 
''' How to access '''
 
These servers are not generally accessible from the internet at large, rather they are only accessible through applications running inside Wikimedia Cloud Services.  


== Quarry ==
== Quarry ==
[https://quarry.wmflabs.org/ Quarry] is a graphical web interface that allows users to query the Wiki Replicas with SQL. It only needs a Wikimedia account to login, and is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See [[m:Research:Quarry]] for help.
 
''' About '''
 
[https://quarry.wmflabs.org/ Quarry] is a graphical web interface that allows users to query the Wiki Replicas with SQL. Quarry is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See [[m:Research:Quarry]] for help.
 
''' How to access '''
 
Quarry requires a Wikimedia SUL account to login.


== PAWS ==
== PAWS ==
[https://paws.wmflabs.org PAWS] is a [https://jupyter.org Jupyter] notebooks on the cloud service that hosts python notebooks and a terminal accessible through a web browser. It also only requires a Wikimedia account to login, and allows for access to the Wiki Replicas, ToolsDB and Dumps. See [[PAWS]] for help.
 
''' About '''
 
[[PAWS]] is a [https://jupyter.org Jupyter] notebooks installation hosted by Wikimedia Cloud Services that hosts Python notebooks and a terminal accessible through a web browser. You can access Wiki Replicas, ToolsDB and Dumps with PAWS.
 
''' How to access '''
 
PAWS requires a Wikimedia SUL account to login.  


== OSM Database ==
== OSM Database ==


We provide a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See [[Help:Toolforge/Database#Connecting to OSM via the official CLI PostgreSQL]] and [[Openstreetmap Databases]] for more information.
''' About '''
 
Wikimedia Cloud Services provides a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See [[Help:Toolforge/Database#Connecting to OSM via the official CLI PostgreSQL]] and [[Openstreetmap Databases]] for more information.
 
== Wikimedia Enterprise ==
 
''' About '''
 
[[meta:Wikimedia Enterprise|Wikimedia Enterprise]] is a set of API's targeting large scale user needs. For more information on the APIs, see [[mw:Wikimedia Enterprise/Documentation|the service's documentation on mediawiki.org]].
 
''' How to access '''
 
Users of Toolforge, Cloud VPS, or [[PAWS]] have access to the Misc and Bulk APIs (Daily and Hourly Exports).


== See also ==
== See also ==

Revision as of 21:48, 4 January 2022

WMCS data services

Data Services includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores.

Data services currently include: Wiki Replicas, ToolsDB, Wikilabels Postgres, Wikimedia Dumps, Shared Storage, CirrusSearch Elasticsearch replicas, Quarry, PAWS, and the OSM Database.

Wiki Replicas

About

Wiki Replicas are MySQL/MariaDB databases that replicate near-realtime from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.

How to access

Access to the Wiki Replicas is automatically granted to all users of Toolforge. See Help:Toolforge/Database to learn how to access the Wiki Replicas.

ToolsDB

About

ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database.

See Help:Toolforge/Database#User databases for help on ToolsDB.

How to access

ToolsDB is acessible on the following addresses:

  • tools.db.svc.eqiad1.wikimedia.cloud (preferred)
  • tools-db.tools.eqiad1.wikimedia.cloud

It used to run on labsdb1005 and got migrated into a Cloud VPS VM called clouddb1001 in the clouddb-services project (more details about the migration are available in phab:T208754 phab:T193264).

You can verify the service status and the availability report in Icinga. Active checks are carried out by Toolschecker upon request by Icinga.

Wikilabels Postgres

About

The Wikilabels Postgres database, used by ORES, is on a replicated VM cluster: clouddb-wikilabels-01 is the primary with clouddb-wikilabels-02 as the usual replica.

Wikimedia Dumps

About

Wikimedia Dumps offers a range of data downloads including full text dumps, and other datasets. More documentation about dumps can be found at Data dumps.

How to access

Shared Storage

About

Shared Storage is offered via NFS. It includes shared directories offered to VPS and ToolForge users. Currently offered shares are described at Help:Shared storage. Wikimedia Dumps are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.

How to access

The Toolforge environment is set up for access by default, and other Cloud VPS projects can access some resources by requesting access to listed shares by filing a task on Phabricator under the Data-Services and VPS-Projects projects.

CirrusSearch Elasticsearch replicas

About

The "Cloud Elastic" servers are a replica of the CirrusSearch Elasticsearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge). Applications can use the full power of the elasticsearch search API's to query the search indices in ways that CirrusSearch does not expose directly on the wikis themselves. See Help:CirrusSearch elasticsearch replicas for more details.

How to access

These servers are not generally accessible from the internet at large, rather they are only accessible through applications running inside Wikimedia Cloud Services.

Quarry

About

Quarry is a graphical web interface that allows users to query the Wiki Replicas with SQL. Quarry is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See m:Research:Quarry for help.

How to access

Quarry requires a Wikimedia SUL account to login.

PAWS

About

PAWS is a Jupyter notebooks installation hosted by Wikimedia Cloud Services that hosts Python notebooks and a terminal accessible through a web browser. You can access Wiki Replicas, ToolsDB and Dumps with PAWS.

How to access

PAWS requires a Wikimedia SUL account to login.

OSM Database

About

Wikimedia Cloud Services provides a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See Help:Toolforge/Database#Connecting to OSM via the official CLI PostgreSQL and Openstreetmap Databases for more information.

Wikimedia Enterprise

About

Wikimedia Enterprise is a set of API's targeting large scale user needs. For more information on the APIs, see the service's documentation on mediawiki.org.

How to access

Users of Toolforge, Cloud VPS, or PAWS have access to the Misc and Bulk APIs (Daily and Hourly Exports).

See also