You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Help:Toolforge/Database: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>BryanDavis
(→‎Identifying lag: update for s8)
imported>Umherirrender
m (tables.sql -> tables-generated.sql - T230428)
(35 intermediate revisions by 22 users not shown)
Line 1: Line 1:
<noinclude>{{Template:Toolforge nav}}{{See also|Help:MySQL queries}}</noinclude>
<noinclude>{{Template:Toolforge nav}}{{See also|Help:MySQL queries|m:Research:Quarry}}</noinclude>


Tool and Tools users are granted access to replicas of the production databases. Private user data has been redacted from these replicas (some rows are elided and/or some columns are made NULL depending on the table). For most practical purposes this is identical to the production databases and sharded into clusters in much the same way.
Tool and Tools users are granted access to [[Wiki replicas|replicas]] of the production databases. Private user data has been redacted from these replicas (some rows are elided and/or some columns are made NULL depending on the table). For most practical purposes this is identical to the production databases and sharded into clusters in much the same way.


Database credentials are generated on account creation and placed in a replica.my.cnf file in the home directory of both a Tool and a Tools user account. This file cannot be modified or removed by users.
Database credentials are generated on account creation and placed in a replica.my.cnf file in the home directory of both a Tool and a Tools user account. This file cannot be modified or removed by users.


Symlinking the access file can be practical:
Symlinking the access file can be practical:


  ln -s $HOME/replica.my.cnf $HOME/my.cnf
<syntaxhighlight lang="shell-session">
$ ln -s $HOME/replica.my.cnf $HOME/.my.cnf
</syntaxhighlight>


<noinclude>
<noinclude>
Line 15: Line 17:
You can connect to the database replicas (and/or the cluster where a database replica is hosted) by specifying your access credentials and the alias of the cluster and replicated database. For example:
You can connect to the database replicas (and/or the cluster where a database replica is hosted) by specifying your access credentials and the alias of the cluster and replicated database. For example:
</noinclude>
</noinclude>
To connect to the English Wikipedia replica, specify the alias of the hosting cluster (enwiki.analytics.db.svc.eqiad.wmflabs) and the alias of the database replica (enwiki_p) :
To connect to the English Wikipedia replica, specify the alias of the hosting cluster (enwiki.analytics.db.svc.wikimedia.cloud) and the alias of the database replica (enwiki_p):
mysql --defaults-file=$HOME/replica.my.cnf -h enwiki.analytics.db.svc.eqiad.wmflabs enwiki_p
<syntaxhighlight lang="shell-session">
$ mysql --defaults-file=$HOME/replica.my.cnf -h enwiki.analytics.db.svc.wikimedia.cloud enwiki_p
</syntaxhighlight>


To connect to the Wikidata cluster:
To connect to the Wikidata cluster:
mysql --defaults-file=$HOME/replica.my.cnf -h wikidatawiki.analytics.db.svc.eqiad.wmflabs
<syntaxhighlight lang="shell-session">
$ mysql --defaults-file=$HOME/replica.my.cnf -h wikidatawiki.analytics.db.svc.wikimedia.cloud
</syntaxhighlight>


To connect to Commons cluster:
To connect to Commons cluster:
mysql --defaults-file=$HOME/replica.my.cnf -h commonswiki.analytics.db.svc.eqiad.wmflabs
<syntaxhighlight lang="shell-session">
$ mysql --defaults-file=$HOME/replica.my.cnf -h commonswiki.analytics.db.svc.wikimedia.cloud
</syntaxhighlight>


There is also a shortcut for connecting to the replicas: sql &lt;dbname&gt;[_p] The _p is optional, but implicit (i.e. the sql tool will add it if absent).
There is also a shortcut for connecting to the replicas: sql &lt;dbname&gt;[_p] The _p is optional, but implicit (i.e. the sql tool will add it if absent).
Line 28: Line 36:
To connect to the English Wikipedia database replica using the shortcut, simply type:
To connect to the English Wikipedia database replica using the shortcut, simply type:


sql enwiki
<syntaxhighlight lang="shell-session">
$ sql enwiki
</syntaxhighlight>


To connect to ToolsDB where you can create and write to tables, type:
To connect to ToolsDB where you can create and write to tables, type:
sql local
<syntaxhighlight lang="shell-session">
This sets server to "tools.db.svc.eqiad.wmflabs" and db to "". It's equivalent to typing-
$ sql tools
mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.eqiad.wmflabs
</syntaxhighlight>
This sets server to "tools.db.svc.wikimedia.cloud" and db to "". It's equivalent to typing-
<syntaxhighlight lang="shell-session">
$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud
</syntaxhighlight>


<noinclude>
<noinclude>
Line 40: Line 54:
As a convenience, each mediawiki project database (enwiki, bgwiki, etc) has an alias to the cluster it is hosted on. The alias has the form:
As a convenience, each mediawiki project database (enwiki, bgwiki, etc) has an alias to the cluster it is hosted on. The alias has the form:


:'''${PROJECT}'''.{analytics,web}.db.svc.eqiad.wmflabs
:'''${PROJECT}'''.{analytics,web}.db.svc.wikimedia.cloud


where '''${PROJECT}''' is the internal database name of a hosted Wikimedia project. The choice of "analytics" or "web" is up to you. The analytics service name connects to Wiki Replica servers where SQL queries will be allowed to run for a longer duration, but at the cost of all queries being potentially slower. Use of the web service name should be reserved for webservices which are running queries that display to users.
where '''${PROJECT}''' is the internal database name of a hosted Wikimedia project. The choice of "analytics" or "web" is up to you. The analytics service name connects to Wiki Replica servers where SQL queries will be allowed to run for a longer duration, but at the cost of all queries being potentially slower. Use of the web service name should be reserved for webservices which are running queries that display to users.
Line 52: Line 66:
: <code>enwiki_p</code> for the English Wikipedia replica
: <code>enwiki_p</code> for the English Wikipedia replica


In addition each cluster can be accessed by the name of its [https://noc.wikimedia.org/dbtree/ Wikimedia production shard] which follows the format '''s${SHARD_NUMBER}.{analytics,web}.db.svc.eqiad.wmflabs''' (for example, <code>s1.analytics.db.svc.eqiad.wmflabs</code> hosts the <code>enwiki_p</code> database). The shard where a particular database is located can change over time. You should only use the shard name for opening a database connection if your application requires it for specific performance reasons such as for heavily crosswiki tools which would otherwise open hundreds of database connections.
In addition each cluster can be accessed by the name of its [https://noc.wikimedia.org/dbtree/ Wikimedia production shard] which follows the format '''s${SHARD_NUMBER}.{analytics,web}.db.svc.wikimedia.cloud''' (for example, <code>s1.analytics.db.svc.wikimedia.cloud</code> hosts the <code>enwiki_p</code> database). The shard where a particular database is located can change over time. You should only use the shard name for opening a database connection if your application requires it for specific performance reasons such as for heavily crosswiki tools which would otherwise open hundreds of database connections.
 
{{Warning|You may find outdated documentation that uses <code>*.labsdb</code> aliases (for example ''enwiki.labsdb'') to refer to the Wiki Replica databases. These service names are deprecated and have not had new wikis added since January 2018. Please update the docs or code that you find these references in to use the ''${PROJECT}.{analytics,web}.db.svc.wikimedia.cloud'' naming convention.}}
 
{{Warning|You may find outdated documentation that uses <code>${project}.{analytics,web}.db.svc.eqiad.wmflabs</code> aliases (for example ''enwiki.web.db.svc.eqiad.wmflabs'') to refer to the Wiki Replica databases. These service names are deprecated. Please update the docs or code that you find these references in to use the ''${PROJECT}.{analytics,web}.db.svc.wikimedia.cloud'' naming convention.}}


=== Connection handling policy ===
=== Connection handling policy ===
Line 62: Line 80:
Idle connections can and will be killed by database and system administrators when discovered. If you (for example, by connector configuration or application policy) then reopen those connections automatically and keep them idle, you will be warned to stop.
Idle connections can and will be killed by database and system administrators when discovered. If you (for example, by connector configuration or application policy) then reopen those connections automatically and keep them idle, you will be warned to stop.


=== Connecting to the database replicas from other Labs instances ===
=== {{anchor|Connecting to the database replicas from other Labs instances}}Connecting to the wiki replicas from other Cloud VPS projects ===
The <code>*.{analytics,web}.db.svc.eqiad.wmflabs</code> servers should be directly accessible from other Labs instances as well (these are provided in DNS), but there is no automatic creation of database credential files. Please create a task in the Labs project to have one created for you.
The <code>*.{analytics,web}.db.svc.wikimedia.cloud</code> servers should be directly accessible from other Cloud VPS projects as well as Toolforge (these are provided in DNS), but there is no automatic creation of database credential files. The easiest way to get user credentials for use in another project is to [[Help:Toolforge#Creating_a_new_Tool_account|create a Toolforge tool account]] and copy its credentials to your Cloud VPS instance.


=== Connecting to the database replicas from your own computer ===
=== Connecting to the database replicas from your own computer ===
You can access the database replicas from your own computer by setting up an SSH tunnel. If you use MySQL Workbench, you can find a detailed description for that application [[#MySQL_Workbench|below]].
You can access the database replicas from your own computer by setting up an SSH tunnel. If you use MySQL Workbench, you can find a detailed description for that application [[#MySQL_Workbench|below]].


Tunneling is a built-in capability of ssh. It allows creating a listening TCP port on your local computer that will transparently forward all connections to a given host and port on the remote side of the ssh connection. The destination host and port do not need to be the host that you are connecting to with your ssh session, but they do need to be reachable from the remote host.
Tunneling is a built-in capability of ssh. It allows creating a listening TCP port on your local computer that will transparently forward all connections to a given host and port on the remote side of the ssh connection. The destination host and port do not need to be the host that you are connecting to with your ssh session, but they do need to be reachable from the remote host.


In the general case, need to add a port forwarding in your ssh tool. In Windows, you can use the tool [[w:PuTTY|PuTTY]] by add in ''Connection → SSH → Tunnels'' the following settings.
In the general case, need to add a port forwarding in your ssh tool. Windows 10 has OpenSSH included and the <code>ssh</code> command can be used. On older versions of Windows, you can use the tool [[w:PuTTY|PuTTY]] by add in ''Connection → SSH → Tunnels'' the following settings.


In Linux, you can add the option <code>-L $LOCAL_PORT:$REMOTE_HOST:$REMOTE_PORT</code> to your <code>ssh</code> call, e. g.:
In Linux or Windows 10, you can add the option <code>-L $LOCAL_PORT:$REMOTE_HOST:$REMOTE_PORT</code> to your <code>ssh</code> call, e. g.:


<source lang="shell-session">
<syntaxhighlight lang="shell-session">
$ ssh -L 4711:enwiki.analytics.db.svc.eqiad.wmflabs:3306 tools-login.wmflabs.org
$ ssh -L 4711:enwiki.analytics.db.svc.wikimedia.cloud:3306 login.toolforge.org
</source>
</syntaxhighlight>


This will set up a tunnel so that connections to port '''4711''' on your own computer will be relayed to the '''enwiki.analytics.db.svc.eqiad.wmflabs''' database replica's MySQL server on port '''3306'''. This tunnel will continue to work as long as the SSH session is open.
This will set up a tunnel so that connections to port '''4711''' on your own computer will be relayed to the '''enwiki.analytics.db.svc.wikimedia.cloud''' database replica's MySQL server on port '''3306'''. This tunnel will continue to work as long as the SSH session is open.


The <code>mysql</code> command line to connect using the tunnel from the example above would look something like:
The <code>mysql</code> command line to connect using the tunnel from the example above would look something like:


<source lang="shell-session">
<syntaxhighlight lang="shell-session">
$ mysql --user=$USER_FROM_REPLICA.MY.CNF --host=127.0.0.1 --port=4711 --password enwiki_p
$ mysql --user=$USER_FROM_REPLICA.MY.CNF --host=127.0.0.1 --port=4711 --password enwiki_p
</source>
</syntaxhighlight>


The '''user''' and '''password''' values needed can be found in the <code>$HOME/replica.my.cnf</code> credentials file for your Toolforge user account or a tool that you have access to.
The '''user''' and '''password''' values needed can be found in the <code>$HOME/replica.my.cnf</code> credentials file for your Toolforge user account or a tool that you have access to.


Note that you need to explicitly use the <code>127.0.0.1</code> IP address; using <code>localhost</code> instead will give an "Access denied" error. <!-- TODO: If someone MySQL-savvy has a short explanation why that is treated differently in our setup … -->
Note that you need to explicitly use the <code>127.0.0.1</code> IP address; using <code>localhost</code> instead will give an "Access denied" error. <!-- TODO: If someone MySQL-savvy has a short explanation why that is treated differently in our setup … -->


==== SSH tunneling for local testing which makes use of Wiki Replica databases ====
==== SSH tunneling for local testing which makes use of Wiki Replica databases ====
{{anchor|Steps to setup SSH tunneling for testing a tool labs application which makes use of tool labs databases on your own computer}}
{{anchor|Steps to setup SSH tunneling for testing a tool labs application which makes use of tool labs databases on your own computer}}
# Setup SSH tunnels: <code>ssh -N yourusername@tools-dev.wmflabs.org -L 3306:enwiki.analytics.db.svc.eqiad.wmflabs:3306</code>
# Setup SSH tunnels: <code>ssh -N yourusername@dev.toolforge.org -L 3306:enwiki.analytics.db.svc.wikimedia.cloud:3306</code>
#* <code>-N</code> prevents ssh from opening an interactive shell. This connection will only be useful for port forwarding.
#* <code>-N</code> prevents ssh from opening an interactive shell. This connection will only be useful for port forwarding.
#* The first port is the listening port on your machine and the second one is on the remote server. 3306 is the default port for MySQL.
#* The first port is the listening port on your machine and the second one is on the remote server. 3306 is the default port for MySQL.
#* For multiple database connections, add additional <code>-L $LOCAL_PORT:$REMOTE_HOST:$REMOTE_PORT</code> sections to the same command or open additional ssh connections.
#* For multiple database connections, add additional <code>-L $LOCAL_PORT:$REMOTE_HOST:$REMOTE_PORT</code> sections to the same command or open additional ssh connections.
#* If you need to connect to more than one Labs database server each database will need a different listening port on your machine (e.g. 3307, 3308, 3309, ...). Change the associated php/python connect command to send requests to that port instead of the default 3306.
#* If you need to connect to more than one Wiki Replica database server, each database will need a different listening port on your machine (e.g. 3307, 3308, 3309, ...). Change the associated php/python connect command to send requests to that port instead of the default 3306.
# (optional) Edit your <code>/etc/hosts</code> file to add something like <code>127.0.0.1 enwiki.analytics.db.svc.eqiad.wmflabs</code> for each of the databases you're connecting to.  
# (optional) Edit your <code>/etc/hosts</code> file to add something like <code>127.0.0.1 enwiki.analytics.db.svc.wikimedia.cloud</code> for each of the databases you're connecting to.
# You might need to copy over the replica.my.cnf file to your local machine for this to work.
# You might need to copy over the replica.my.cnf file to your local machine for this to work.


=== TLS connection failures ===
=== TLS connection failures ===
Line 108: Line 125:


== Databases ==
== Databases ==
{{See also|Help:MySQL queries}}


=== Replica database schema (tables and indexes) ===
=== Replica database schema (tables and indexes) ===
The database replicas for the various Wikimedia projects follow the standard MediaWiki database schema [[Mw:Manual:Database layout|described on mediawiki.org]] and in [[Phab:diffusion/MW/browse/master/maintenance/tables.sql|the MediaWiki git repository]].
The database replicas for the various Wikimedia projects follow the standard MediaWiki database schema [[Mw:Manual:Database layout|described on mediawiki.org]] and in [[Phab:diffusion/MW/browse/master/maintenance/tables-generated.sql|the MediaWiki git repository]].


Many of the indexes on these tables are actually compound indexes designed to optimize the runtime performance of the MediaWiki software rather than to be convenient for ad hoc queries. For example, a naive query by <code>page_title</code> such at <code>SELECT * FROM page WHERE page_title = 'NOFX';</code> will be slow because the index which includes <code>page_title</code> is a compound index with <code>page_namespace</code>. Adding <code>page_namespace</code> to the <code>WHERE</code> clause will improve the query speed dramatically: <code>SELECT * FROM page WHERE page_namespace = 0 AND page_title = 'NOFX';</code>
Many of the indexes on these tables are actually compound indexes designed to optimize the runtime performance of the MediaWiki software rather than to be convenient for ad hoc queries. For example, a naive query by <code>page_title</code> such at <code>SELECT * FROM page WHERE page_title = 'NOFX';</code> will be slow because the index which includes <code>page_title</code> is a compound index with <code>page_namespace</code>. Adding <code>page_namespace</code> to the <code>WHERE</code> clause will improve the query speed dramatically: <code>SELECT * FROM page WHERE page_namespace = 0 AND page_title = 'NOFX';</code>
==== Stability of the mediawiki database schema ====
[[Phab:diffusion/MW/browse/master/maintenance/tables-generated.sql|maintenance/tables-generated.sql]] shows the HEAD of the mediawiki changes. Extra tables may be available due to additional extensions setup in production. Also some tables may have been redacted or filtered for containing private data such as the user passwords or private ip addresses. Aside from that, while we try to synchronize production with development HEAD, changes to the database structure may be applied in advance (or more commonly) lag behind its publication. The reason for this is that schema changes are being continuously applied to production databases, and due to the amout of data, it may take a few hours to a few months (in the case of more complex cases) to be finalized.
Core tables, such as revision, page, user, recentchanges rarely change, but '''cloud maintainers cannot guarantee they will never change''', as they have to follow the production changes. While we are happy for people to setup scripts and tools on top of the database copies (wikireplicas) expect the schema to change every now and then. If you cannot do small tweaks from time to time to adapt to the latest schema changes, using the API instead of the database internals is suggested, as '''API changes have more guarantees of stability and a proper lifecycle and deprecation policy. That is not true for mediawiki database internals''', although compatibility views can sometimes be setup to require only minimal changes.


==== Tables for revision or logging queries involving user names and IDs ====
==== Tables for revision or logging queries involving user names and IDs ====
The <code>revision</code> and <code>logging</code> tables do not have indexes on user columns. [http://lists.wikimedia.org/pipermail/labs-l/2013-September/001607.html In an email], one of the system administrators pointed out that this is because "those values are conditionally nulled when supressed" (see also [[phab:T68786]] for some more detail). One has to instead use the corresponding <code>revision_userindex</code> or <code>logging_userindex</code> for these types of queries. On those views, rows where the column would have otherwise been nulled are elided; this allows the indexes to be usable.
The <code>revision</code> and <code>logging</code> tables do not have indexes on user columns. [http://lists.wikimedia.org/pipermail/labs-l/2013-September/001607.html In an email], one of the system administrators pointed out that this is because "those values are conditionally nulled when supressed" (see also [[phab:T68786]] for some more detail). One has to instead use the corresponding <code>revision_userindex</code> or <code>logging_userindex</code> for these types of queries. On those views, rows where the column would have otherwise been nulled are elided; this allows the indexes to be usable.


Example query that will use the appropriate index (in this case on the <code>rev_user_text</code> column, the <code>rev_user</code> column works the same way for user IDs):
Example query that will use the appropriate index (in this case on the <code>rev_actor</code> column)


SELECT rev_id, rev_timestamp FROM revision_userindex WHERE rev_user_text="Foo"
<syntaxhighlight lang="sql">
SELECT rev_id, rev_timestamp FROM revision_userindex WHERE rev_actor=1234;
</syntaxhighlight>


Example query that ''fails'' to use an index because the table doesn't have them:
Example query that ''fails'' to use an index because the table doesn't have them:


SELECT rev_id, rev_timestamp FROM revision WHERE rev_user_text="Foo"
<syntaxhighlight lang="sql">
SELECT rev_id, rev_timestamp FROM revision WHERE rev_actor=1234;
</syntaxhighlight>


You should use the indexes so queries will go faster (performance).
You should use the indexes so queries will go faster (performance).


==== Redacted tables ====
==== Redacted tables ====
<code>language</code>, <code>skin</code>, <code>timecorrection</code>, and <code>variant</code> properties have been deemed sensitive and are removed from [[mw:Manual:User_properties_table|user_properties table]] of Labs DB replicas.
The majority of the [[mw:Manual:User_properties_table|user_properties table]] has been deemed sensitive and removed from the Wiki Replica databases. Only the <code>disableemail</code>, <code>fancysig</code>, <code>gender</code>, and <code>nickname</code> properties are available.


==== Unused tables ====
==== Unused tables ====
Some of the standard MediaWiki tables that are in use on Wikimedia wikis, are not available. The following tables are missing or empty:
Some of the standard MediaWiki tables that are in use on Wikimedia wikis, are not available. The following tables are missing or empty:
* <code>interwiki</code> ([[phab:T103589|T103589]])
* <code>interwiki</code> ([[phab:T103589|T103589]])
* [[mw:Manual:text table|<code>text</code> table]] ([[mw:Manual:MediaWiki architecture#Database and text storage]]): Users can use the [[mw:API:Query#Sample query|API]], or [[Help:Toolforge#Dumps|dumps]] to access page contents.
* [[mw:Manual:text table|<code>text</code> table]] ([[mw:Manual:MediaWiki architecture#Database and text storage]]): Users can use the [[mw:API:Query#Sample query|API]], or [[Help:Toolforge#Dumps|dumps]] to access page contents. Access to search indexes of page content is tracked at [[phab:T109715]].


=== Metadata database ===
=== Metadata database ===
{{tracked|T50626|resolved}}
{{tracked|T50626|resolved}}
There is a table with automatically maintained meta information about the replicated databases: meta_p.wiki. See [http://quarry.wmflabs.org/query/4031 Quarry #4031] for an up-to-date version.
There is a table with automatically maintained meta information about the replicated databases: <code>meta_p.wiki</code>. See [http://quarry.wmflabs.org/query/4031 Quarry #4031] for an up-to-date version.


MariaDB [meta_p]> DESCRIBE wiki;
The database host containing the <code>meta_p</code> database is: <code>meta.analytics.db.svc.wikimedia.cloud</code>.
<syntaxhighlight lang="sql">
MariaDB [meta_p]> DESCRIBE wiki;
</syntaxhighlight>


<pre>
<pre>
Line 163: Line 194:
Example data:
Example data:


MariaDB [nlwiki_p]> select * from meta_p.wiki limit 1 \G
<syntaxhighlight lang="sql">
MariaDB [meta_p]> select * from wiki limit 1 \G
</syntaxhighlight>


<pre>
<pre>
*************************** 1. row ***************************
*************************** 1. row ***************************
           dbname: aawiki
           dbname: aawiki
             lang: aa
             lang: aa
             name: Wikipedia
             name: Wikipedia
           family: wikipedia
           family: wikipedia
             url: http://aa.wikipedia.org
             url: https://aa.wikipedia.org
             size: 1
             size: 1
           slice: s3.labsdb
           slice: s3.labsdb
Line 184: Line 217:
=== Identifying lag ===
=== Identifying lag ===


If there is a network/labs db infrastructure problem, production problem, maintenance (scheduled or unscheduled), excessive load or production or user's queries blocking the replication process, labs replicas can get behind the production databases "lag".
If there is a network/Wiki Replica db infrastructure problem, production problem, maintenance (scheduled or unscheduled), excessive load or production or user's queries blocking the replication process, the Wiki Replicas can "lag" behind the production databases.


To identify lag, see https://tools.wmflabs.org/replag/ or execute yourself on the database host you are connected to:
To identify lag, see [[toolforge:replag/|the replag tool]] or execute yourself on the database host you are connected to:
 
<syntaxhighlight lang="sql">
(u3518@enwiki.analytics.db.svc.wikimedia.cloud) [heartbeat_p]> SELECT * FROM heartbeat;
</syntaxhighlight>


<pre>
<pre>
(u3518@enwiki.analytics.db.svc.eqiad.wmflabs) [heartbeat_p]> SELECT * FROM heartbeat;
+-------+----------------------------+--------+
+-------+----------------------------+--------+
| shard | last_updated              | lag    |
| shard | last_updated              | lag    |
Line 213: Line 249:
To directly query the replication lag for a particular wiki, use requests like:
To directly query the replication lag for a particular wiki, use requests like:


MariaDB [fawiki_p]> SELECT lag FROM heartbeat_p.heartbeat JOIN meta_p.wiki ON CONCAT(shard, '.analytics.db.svc.eqiad.wmflabs') = slice WHERE dbname = 'fawiki';
<syntaxhighlight lang="sql">
+------+
MariaDB [fawiki_p]> SELECT lag FROM heartbeat_p.heartbeat JOIN meta_p.wiki ON shard = SUBSTRING_INDEX(slice, ".", 1) WHERE dbname = 'fawiki';
| lag  |
 
+------+
+------+
|    0 |
| lag  |
+------+
+------+
1 row in set (0.09 sec)
|    0 |
+------+
1 row in set (0.09 sec)
</syntaxhighlight>


Please note that some seconds or a few minutes of lag is considered normal, due to the filtering process and the hops done before reaching the public hosts.
Please note that some seconds or a few minutes of lag is considered normal, due to the filtering process and the hops done before reaching the public hosts.
=== Replica drift ===
{{Help section|/Replica drift}}


=== User databases ===
=== User databases ===
User-created databases can be created on a shared server: <code>tools.db.svc.eqiad.wmflabs</code>. Database names must start with the name of the credential user followed by two underscores and then the name of the database: '''<credentialUser>__<DBName>''' (e.g. "s51234__mydb").
User-created databases can be created on a shared server: <code>tools.db.svc.wikimedia.cloud</code>. Database names must start with the name of the credential user followed by ''two underscores'' and then the name of the database: '''<credentialUser>__<DBName>''' (e.g. "s51234__mydb").


The credential user is not your user name. It can be found in your ~/replica.my.cnf file. The name of the credential user looks something like 'u1234' for a user and 's51234' for a service group. You can also find the name of the credential user using a live database connection:
The credential user is not your user name. It can be found in your $HOME/replica.my.cnf file. The name of the credential user looks something like 'u1234' for a user and 's51234' for a tool account. You can also find the name of the credential user using a live database connection:
<syntaxhighlight lang="sql">
<syntaxhighlight lang="sql">
SELECT SUBSTRING_INDEX(CURRENT_USER(), '@', 1);
SELECT SUBSTRING_INDEX(CURRENT_USER(), '@', 1);
Line 237: Line 273:
Users have all privileges and have access to all grant options on their databases. Database names ending with <code>_p</code> are granted read access for everyone. Please create a [https://phabricator.wikimedia.org/maniphest/task/create/ ticket] if you need more fine-grained permissions, like sharing a database only between 2 users, or other special permissions.
Users have all privileges and have access to all grant options on their databases. Database names ending with <code>_p</code> are granted read access for everyone. Please create a [https://phabricator.wikimedia.org/maniphest/task/create/ ticket] if you need more fine-grained permissions, like sharing a database only between 2 users, or other special permissions.


==== Steps to create a user database on tools.db.svc.eqiad.wmflabs ====
==== Steps to create a user database on tools.db.svc.wikimedia.cloud ====
To create a database on <code>tools.db.svc.eqiad.wmflabs</code>:
To create a database on <code>tools.db.svc.wikimedia.cloud</code>:


# Become your tool account. <pre>maintainer@tools-login:~$ become toolaccount</pre>
# Become your tool account. <pre>maintainer@tools-login:~$ become toolaccount</pre>
# Connect to tools.db.svc.eqiad.wmflabs with the replica.my.cnf credentials: <pre>mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.eqiad.wmflabs</pre> You could also just type: <pre>sql tools</pre>
# Connect to tools.db.svc.wikimedia.cloud with the replica.my.cnf credentials: <pre>mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud</pre> You could also just type: <pre>sql tools</pre>
# In the mysql console, create a new database (where CREDENTIALUSER is your credentials user, which can be found in your ~/replica.my.cnf file, and DBNAME the name you want to give to your database. Note that there are 2 underscores between CREDENTIALUSER and DBNAME): <pre>MariaDB [(none)]> CREATE DATABASE CREDENTIALUSER__DBNAME;</pre>
# In the mysql console, create a new database (where CREDENTIALUSER is your credentials user, which can be found in your ~/replica.my.cnf file, and DBNAME the name you want to give to your database. Note that there are 2 underscores between CREDENTIALUSER and DBNAME): <pre>MariaDB [(none)]> CREATE DATABASE CREDENTIALUSER__DBNAME;</pre>


You can then connect to your database using:
You can then connect to your database using:
mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.eqiad.wmflabs CREDENTIALUSER__DBNAME
<syntaxhighlight lang="shell-session">
$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud CREDENTIALUSER__DBNAME
</syntaxhighlight>
 
Or:
<syntaxhighlight lang="shell-session">
$ sql tools
</syntaxhighlight>
<syntaxhighlight lang="sql">
MariaDB [(none)]> USE CREDENTIALUSER__DBNAME;
</syntaxhighlight>


'''Example'''
'''Example'''


Assuming that your tool account is called "mytool", this is what it would look like:
Assuming that your tool account is called "mytool", this is what it would look like:
maintainer@tools-login:~$ become mytool
<syntaxhighlight lang="shell-session">
tools.mytool@tools-login:~$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.eqiad.wmflabs
$ maintainer@tools-login:~$ become mytool
MariaDB [(none)]> select substring_index(current_user(), '@', 1) as uname;
$ tools.mytool@tools-login:~$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud
+---------------+
</syntaxhighlight>
| uname        |
<syntaxhighlight lang="sql">
+---------------+
MariaDB [(none)]> select substring_index(current_user(), '@', 1) as uname;
| u123something |
+---------------+
+---------------+
| uname        |
1 row in set (0.00 sec)
+---------------+
MariaDB [(none)]> create database u123something__wiki;
| u123something |
+---------------+
1 row in set (0.00 sec)
MariaDB [(none)]> create database u123something__wiki;
</syntaxhighlight>


{{caution|The DNS alias <code>tools-db</code> is deprecated.}}
{{caution|The legacy <code>tools-db</code> service name was deprecated in September 2017 and removed in May 2019. Use ''tools.db.svc.wikimedia.cloud'' instead.}}'''Note''': Some projects like <code>python-Django</code> can throw an exception like <code>MySQLdb._exceptions.OperationalError: (1709, 'Index column size too large. The maximum column size is 767 bytes.')</code> when migrated using the setup above. This can be fixed by altering the database charset to <code>utf-8</code>in most cases. To avoid this, create the database using the following command instead to specify the charset:
<syntaxhighlight lang="sql">
MariaDB [(none)]> CREATE DATABASE CREDENTIALUSER__DBNAME CHARACTER SET utf8;
</syntaxhighlight>


==== ToolsDB Backups and Replication ====
==== ToolsDB Backups and Replication ====
We maintain two copies of the user and tool databases in ToolsDB, with a MySQL master-slave replication setup. However, we don't do offline backups of any of the databases in ToolsDB. ToolsDB users can backup their data using [https://mariadb.com/kb/en/mariadb/mysqldump/ mysqldump] if necessary. Note that we don't recommend storing backups permanently on NFS (/data/project, /home, or /data/scratch on Toolforge) or on any other Cloud VPS hosted drive. True backups should be kept offsite.
We maintain two copies of the user and tool databases in ToolsDB, with a MySQL primary-replica setup.
However, we don't do offline backups of any of the databases in ToolsDB.
ToolsDB users can backup their data using [https://mariadb.com/kb/en/mariadb/mysqldump/ mysqldump] if necessary:
 
<syntaxhighlight lang="shell-session">
$ umask o-r # dump should not be public (unless the database is)
$ mysqldump --defaults-file=~/replica.my.cnf --host=tools.db.svc.wikimedia.cloud credentialUser__DBName > ~/DBname-$(date -I).sql
$ umask 0022 # restore default umask
</syntaxhighlight>
 
Note that we don't recommend storing backups permanently on NFS (/data/project, /home, or /data/scratch on Toolforge) or on any other Cloud VPS hosted drive. True backups should be kept offsite.


There are some ToolsDB databases which are '''not''' replicated for various reasons including size and access patterns:
There are some ToolsDB databases which are '''not''' replicated for various reasons including size and access patterns:
Line 271: Line 334:
* s51071__templatetiger_p
* s51071__templatetiger_p
* s52721__pagecount_stats_p
* s52721__pagecount_stats_p
* s51290__dpl_p


Users were warned about this particularity and details can be seen at {{phabT|127164}}.
Users were warned about this particularity and details can be seen at {{phabT|127164}}.
Line 280: Line 344:
Since MariaDB 10.1 (the version used on tools and the new wikireplicas), one can use [https://mariadb.com/kb/en/library/server-system-variables/#max_statement_time max_statement_time] (unit is seconds, it allows decimals) :
Since MariaDB 10.1 (the version used on tools and the new wikireplicas), one can use [https://mariadb.com/kb/en/library/server-system-variables/#max_statement_time max_statement_time] (unit is seconds, it allows decimals) :


SET max_statement_time = 300;
<syntaxhighlight lang="sql">
SET max_statement_time = 300;
</syntaxhighlight>


And all subsequent queries on the same connection will be killed if they run for longer than the given time.
And all subsequent queries on the same connection will be killed if they run for longer than the given time.
Line 286: Line 352:
For example:
For example:


mariadb[(none)]> SET max_statement_time = 10;
<syntaxhighlight lang="sql">
Query OK, 0 rows affected (0.00 sec)
mariadb[(none)]> SET max_statement_time = 10;
Query OK, 0 rows affected (0.00 sec)
mariadb[(none)]> SELECT sleep(20);
 
+-----------+
mariadb[(none)]> SELECT sleep(20);
| sleep(20) |
+-----------+
+-----------+
| sleep(20) |
|        1 |
+-----------+
+-----------+
|        1 |
1 row in set (10.00 sec)
+-----------+
1 row in set (10.00 sec)
</syntaxhighlight>


It works on [https://quarry.wmflabs.org/query/22003 Quarry], too!
It works on [https://quarry.wmflabs.org/query/22003 Quarry], too!
You can also set limits with a single SQL query. For example:<syntaxhighlight lang="sql">
SET STATEMENT max_statement_time = 300 FOR
SELECT COUNT(rev_id) FROM revision_userindex
INNER JOIN actor
ON rev_actor = actor_id
WHERE actor_name = 'Jimbo Wales'
</syntaxhighlight>


== Example queries ==
== Example queries ==
Line 304: Line 380:


== Connecting with... ==
== Connecting with... ==
=== Python ===
Use [[User:Legoktm/toolforge library]].


=== MySQL Workbench ===
=== MySQL Workbench ===
You can connect to databases on Toolforge with [http://www.mysql.com/products/workbench/ MySQL Workbench] (or similar client applications) via an SSH tunnel.  
[[File:Toolforge DB example.png|thumb|240px|Example configuration of MySQL Workbench for Toolforge]]
You can connect to databases on Toolforge with [http://www.mysql.com/products/workbench/ MySQL Workbench] (or similar client applications) via an SSH tunnel.


Instructions for connecting via MySQL Workbench are as follows:
Instructions for connecting via MySQL Workbench are as follows:
# Launch MySQL Workbench on your local machine.
# Launch MySQL Workbench on your local machine.
# Click the plus icon next to "MySQL Connections" in the Workbench window (or choose "Manage Connections..." from the Database menu and click the "new" button).
# Click the plus icon next to "MySQL Connections" in the Workbench window (or choose "Manage Connections..." from the Database menu and click the "new" button).
[[File:Example configuration MySQL Workbench toollabs.png|thumb|Example configuration of MySQL Workbench for Toolforge]]
# Set Connection Method to "Standard TCP/IP over SSH"
# Set Connection Method to "Standard TCP/IP over SSH"
# Set the following connection parameters:
# Set the following connection parameters:
#* SSH Hostname: tools-login.wmflabs.org
#* SSH Hostname: login.toolforge.org
#* SSH Username: <your Toolforge shell username>
#* SSH Username: <your Toolforge shell username>
#* SSH Key File: <your Toolforge SSH private key file>
#* SSH Key File: <your Toolforge SSH private key file><ref>If your private key is in a RFC4716 format, you will have to convert it to a PEM key.</ref>
#* SSH Password: password/passphrase of your private key (if set) - <u>not</u> your wiki login password.
#* SSH Password: password/passphrase of your private key (if set) - <u>not</u> your wiki login password.
#* MySQL Hostname: enwiki.analytics.db.svc.eqiad.wmflabs (or whatever server your database lives on)
#* MySQL Hostname: enwiki.analytics.db.svc.wikimedia.cloud (or whatever server your database lives on)
#* MySQL Server Port: 3306
#* MySQL Server Port: 3306
#* Username: <your Toolforge MySQL user name (from $HOME/replica.my.cnf)>
#* Username: <your Toolforge MySQL user name (from $HOME/replica.my.cnf)>
Line 330: Line 402:
Replica-db hostnames can be found in /etc/hosts. Bear in mind to add the _p suffix if setting a default schema for replica databases. e.g: enwiki_p.
Replica-db hostnames can be found in /etc/hosts. Bear in mind to add the _p suffix if setting a default schema for replica databases. e.g: enwiki_p.


[[File:Ambox_notice.png|20px]] If you are using SSH keys generated with PuTTYgen (Windows users), you need to convert your private key to the 'OpenSSH' format. Load your private key in PuTTYgen, then click Conversions » Export OpenSSH key. Use this file as SSH Key File above.
[[File:Ambox notice.png|20px]] If you are using SSH keys generated with PuTTYgen (Windows users), you need to convert your private key to the 'OpenSSH' format. Load your private key in PuTTYgen, then click Conversions » Export OpenSSH key. Use this file as SSH Key File above.
 
[[File:Ambox notice.png|20px]] If you are getting errors with SSL, you can try disabling it. From the menu bar: Database -> select your connection -> SSL -> Change "Use SSL" to "No".


=== PGAdmin (for OpenStreetMap databases) ===
=== PGAdmin (for OpenStreetMap databases) ===
Line 373: Line 447:
=== Connecting to OSM via the official CLI PostgreSQL ===
=== Connecting to OSM via the official CLI PostgreSQL ===


# SSH to tools-login.wmflabs.org or tools-dev.wmflabs.org  
# SSH to login.toolforge.org or dev.toolforge.org
# psql -h osmdb.eqiad.wmnet -U osm gis
# psql -h osmdb.eqiad.wmnet -U osm gis
# Start querying
# Start querying
Line 380: Line 454:
=== Connecting from a Servlet in Tomcat ===
=== Connecting from a Servlet in Tomcat ===


#create directory "lib" in directory "public_tomcat"
# create directory "lib" in directory "public_tomcat"
#copy "mysql-connector-java-bin.jar" to "public_tomcat/lib"
# copy "mysql-connector-java-bin.jar" to "public_tomcat/lib"
#<source lang="Java">
# <syntaxhighlight lang="Java">
import org.apache.tomcat.jdbc.pool.DataSource;
import org.apache.tomcat.jdbc.pool.DataSource;
import org.apache.tomcat.jdbc.pool.PoolProperties;
import org.apache.tomcat.jdbc.pool.PoolProperties;


String DBURL   = "jdbc:mysql://tools.db.svc.eqiad.wmflabs:3306/";
String DBURL = "jdbc:mysql://tools.db.svc.wikimedia.cloud:3306/";
String DBDRIVER = "com.mysql.jdbc.Driver";
String DBDRIVER = "com.mysql.jdbc.Driver";
String DATABASE = DBUSER + "__" + PROJECT;
String DATABASE = DBUSER + "__" + PROJECT;


PoolProperties p = new PoolProperties();
PoolProperties p = new PoolProperties();
p.setUrl           (DBURL + DATABASE);
p.setUrl (DBURL + DATABASE);
p.setDriverClassName(DBDRIVER       );
p.setDriverClassName(DBDRIVER );
p.setUsername       (DBUSER         );
p.setUsername (DBUSER );
p.setPassword       (DBPASSWORD     );
p.setPassword (DBPASSWORD );
p.setJdbcInterceptors(
p.setJdbcInterceptors(
"org.apache.tomcat.jdbc.pool.interceptor.ConnectionState;" +
"org.apache.tomcat.jdbc.pool.interceptor.ConnectionState;" +
Line 400: Line 474:
DataSource datasource = new DataSource();
DataSource datasource = new DataSource();
datasource.setPoolProperties(p);
datasource.setPoolProperties(p);
Connection connection = datasource.getConnection ();
Connection connection = datasource.getConnection ();
Statement statement = connection.createStatement();  
Statement statement = connection.createStatement();
</source>
</syntaxhighlight>
#<code>javac -classpath javax.servlet.jar:tomcat-jdbc.jar myhttpservlet.java</code>
# <code>javac -classpath javax.servlet.jar:tomcat-jdbc.jar myhttpservlet.java</code>


===Code samples for common languages===
== Code samples for common languages ==
:''Copied with edits from [[:mw:Toolserver:Database access#Program access]]'' (not all tested, use with cation!)
:''Copied with edits from [[:mw:Toolserver:Database access#Program access]]'' (not all tested, use with caution!)


In most programming languages, it will be sufficient to tell MySQL to use the database credentials found in <tt>$HOME/.my.cnf</tt> assuming that you have created a symlink from <tt>$HOME/.my.cnf</tt> to <tt>$HOME/replica.my.cnf</tt>.
In most programming languages, it will be sufficient to tell MySQL to use the database credentials found in <tt>$HOME/.my.cnf</tt> assuming that you have created a symlink from <tt>$HOME/.my.cnf</tt> to <tt>$HOME/replica.my.cnf</tt>.
Line 412: Line 486:
Below are various examples in a few common programming languages.
Below are various examples in a few common programming languages.


==== Bash ====
=== Bash ===
<source lang="bash">
<syntaxhighlight lang="bash">
-- 2> /dev/null; date; echo '
-- 2> /dev/null; date; echo '
/* Bash/SQL compatible test structure
/* Bash/SQL compatible test structure
Line 420: Line 494:
  */
  */
SELECT 1
SELECT 1
;-- ' | mysql -ch tools.db.svc.eqiad.wmflabs enwiki_p > ~/query_results-enwiki; date;
;-- ' | mysql -ch tools.db.svc.wikimedia.cloud enwiki_p > ~/query_results-enwiki; date;
</source>
</syntaxhighlight>


==== C ====
=== C ===
<source lang="c">
<syntaxhighlight lang="c">
#include <my_global.h>
#include <my_global.h>
#include <mysql.h>
#include <mysql.h>
Line 430: Line 504:
...
...


  char *host = "tools.db.svc.eqiad.wmflabs";
  char *host = "tools.db.svc.wikimedia.cloud";
  MYSQL *conn = mysql_init(NULL);
  MYSQL *conn = mysql_init(NULL);


Line 438: Line 512:
     ...
     ...
  }
  }
</source>
</syntaxhighlight>


==== Perl ====
=== Perl ===
<source lang="perl">
<syntaxhighlight lang="perl">
use User::pwent;
use User::pwent;
use DBI;
use DBI;


my $database = "enwiki_p";
my $database = "enwiki_p";
my $host = "tools.db.svc.eqiad.wmflabs";
my $host = "tools.db.svc.wikimedia.cloud";


my $dbh = DBI->connect(
my $dbh = DBI->connect(
Line 452: Line 526:
     . "mysql_read_default_file=" . getpwuid($<)->dir . "/replica.my.cnf",
     . "mysql_read_default_file=" . getpwuid($<)->dir . "/replica.my.cnf",
     undef, undef) or die "Error: $DBI::err, $DBI::errstr";
     undef, undef) or die "Error: $DBI::err, $DBI::errstr";
</source>
</syntaxhighlight>
 
=== {{anchor|Python_2}}Python ===
Without installing the toolforge library, this will work:
<syntaxhighlight lang="python">
import pymysql
database = "enwiki_p"
conn = pymysql.connections.Connection(user=user, password=password, database=wpDatabase, host='enwiki.analytics.db.svc.wikimedia.cloud')
with conn.cursor() as cur:
    cur.execute(query)  # Or something....
</syntaxhighlight>
 
Using [[User:Legoktm/toolforge library]], however, is probably the easiest way. This wrapper library supports both Python 3 and legacy Python 2 applications and provides convenience functions for connecting to the Wiki Replica databases.
 
<syntaxhighlight lang="python">
import toolforge
conn = toolforge.connect('enwiki') # You can also use "enwiki_p"
# conn is a pymysql.connection object.
with conn.cursor() as cur:
    cur.execute(query)  # Or something....
</syntaxhighlight>


==== Python ====
We used to recommend [http://packages.python.org/oursql/ oursql] as well, but [[phab:T215963|as of 2019-02-20]] it seems to be abandoned or at least not actively maintained and failing to compile against MariaDB client libraries.
:Using [[User:Legoktm/toolforge library]] is probably the best way (which supports Python 3).


Using [http://packages.python.org/oursql/ oursql] (uses less memory):
=== Python: Django ===
<source lang="python">
If you are using Django, first install install mysqlclient:
import os, oursql
pip install mysqlclient
db = oursql.connect(db='tools.db.svc.eqiad.wmflabs',
        host="localhost",
        read_default_file=os.path.expanduser("~/replica.my.cnf"),
        charset=None,
        use_unicode=False
)
</source>
Extra flags are required for oursql to force binary mode since VARCHAR fields on sql-s2 are mislabeled with latin-1.  Manual decoding is required even after upgrading since the fields will be VARBINARY instead.
''Note: oursql is only installed on solaris, see [[jira:TS-760]] and [[jira:TS-1452]] for more information.''


==== PHP ====
Then insert the database in the settings.py file as following, with s12345 your user name:
<source lang="php">
<syntaxhighlight lang="python">
<?php
    import configparser
$ts_pw = posix_getpwuid(posix_getuid());
    import os
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$db = mysql_connect('enwiki.analytics.db.svc.eqiad.wmflabs', $ts_mycnf['user'], $ts_mycnf['password']);
unset($ts_mycnf, $ts_pw);


mysql_select_db('enwiki_p', $db);
    HOME=os.environ.get('HOME') #get environment variable $HOME


// YOUR REQUEST HERE
    replica_path=HOME + '/replica.my.cnf'
    if os.path.exists(replica_path):          #check that the file is found
        config = configparser.ConfigParser()
        config.read(replica_path)
    else:
        print('replica.my.cnf file not found')


?>
    DATABASES = {
</source>
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 's12345__mydbname',                               
        'USER': config['client']['user'],                          #for instance "s12345"
        'PASSWORD': config['client']['password'],
        'HOST': 'tools.db.svc.wikimedia.cloud',
        'PORT': '',
    }
    }
</syntaxhighlight>


==== PHP (using [http://php.net/manual/book.pdo.php PDO]) ====
=== PHP (using [http://php.net/manual/book.pdo.php PDO]) ===
<source lang="php">
<syntaxhighlight lang="php">


<?php
<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$db = new PDO("mysql:host=enwiki.analytics.db.svc.eqiad.wmflabs;dbname=enwiki_p", $ts_mycnf['user'], $ts_mycnf['password']);
$db = new PDO("mysql:host=enwiki.analytics.db.svc.wikimedia.cloud;dbname=enwiki_p", $ts_mycnf['user'], $ts_mycnf['password']);
unset($ts_mycnf, $ts_pw);
unset($ts_mycnf, $ts_pw);


Line 498: Line 593:
print_r($q->fetchAll());
print_r($q->fetchAll());
?>
?>
</source>
</syntaxhighlight>


==== PHP (using [http://php.net/manual/book.mysqli.php MySQLi]) ====
=== PHP (using [http://php.net/manual/book.mysqli.php MySQLi]) ===
<source lang="php">
<syntaxhighlight lang="php">


<?php
<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$mysqli = new mysqli('enwiki.analytics.db.svc.wikimedia.cloud', $ts_mycnf['user'], $ts_mycnf['password'], 'enwiki_p');
unset($ts_mycnf, $ts_pw);


$mysqli = new mysqli('enwiki.analytics.db.svc.eqiad.wmflabs', $ts_mycnf['user'], $ts_mycnf['password'], 'enwiki_p');
$stmt = $mysqli->prepare('select * from page where page_id = ?');
 
$id = 843020;
// YOUR REQUEST HERE
$stmt->bind_param('i', $id);
 
$stmt->execute();
$result = $stmt->get_result();
print_r($result->fetch_all(MYSQLI_BOTH));
?>
?>
</source>
</syntaxhighlight>


==== Java ====
=== Java ===
<source lang="java">
<syntaxhighlight lang="java">
Class.forName("com.mysql.jdbc.Driver").newInstance();
Class.forName("com.mysql.jdbc.Driver").newInstance();
Properties mycnf = new Properties();
Properties mycnf = new Properties();
Line 526: Line 625:
mycnf.put("characterEncoding", "UTF-8");
mycnf.put("characterEncoding", "UTF-8");
mycnf.put("connectionCollation", "utf8_general_ci");
mycnf.put("connectionCollation", "utf8_general_ci");
String url = "jdbc:mysql://tools.db.svc.eqiad.wmflabs:3306/enwiki_p";
String url = "jdbc:mysql://tools.db.svc.wikimedia.cloud:3306/enwiki_p";
Connection conn = DriverManager.getConnection(url, mycnf);
Connection conn = DriverManager.getConnection(url, mycnf);
</source>
</syntaxhighlight>
 
=== Node.js ===
[https://www.npmjs.com/package/mysql2 mysql2] client provides a promise-based interface.
<syntaxhighlight lang="javascript">
const mysql = require('mysql2/promise');
async function sample() {
  const connection = await mysql.createConnection({
    host: 'tools.db.svc.wikimedia.cloud',
    port: 3306,
    database: 's12345__mydbname',
    user: 's12345',
    password: ''
  });
  const [rows, fields] = await connection.execute('SELECT * FROM table WHERE name = ? AND age > ?', ['Morty', 14]);
  for (let row in rows) console.log(row);
}
</syntaxhighlight>
 
<noinclude>


== See also ==
== See also ==
* [[phab:phame/post/view/70/new_wiki_replica_servers_ready_for_use/|"New Wiki Replica servers ready for use"]](announcement post, September 25, 2017)  
* [[phab:phame/post/view/70/new_wiki_replica_servers_ready_for_use/|"New Wiki Replica servers ready for use"]](announcement post, September 25, 2017)
[[Category:Toolforge]]
* [[Portal:Data Services/Admin/Wiki Replicas]]
[[Category:Toolforge|Database]]
 
== Note ==
<References/>
</noinclude>

Revision as of 00:43, 26 December 2021

Tool and Tools users are granted access to replicas of the production databases. Private user data has been redacted from these replicas (some rows are elided and/or some columns are made NULL depending on the table). For most practical purposes this is identical to the production databases and sharded into clusters in much the same way.

Database credentials are generated on account creation and placed in a replica.my.cnf file in the home directory of both a Tool and a Tools user account. This file cannot be modified or removed by users.

Symlinking the access file can be practical:

$ ln -s $HOME/replica.my.cnf $HOME/.my.cnf


Connecting to the database replicas

You can connect to the database replicas (and/or the cluster where a database replica is hosted) by specifying your access credentials and the alias of the cluster and replicated database. For example:

To connect to the English Wikipedia replica, specify the alias of the hosting cluster (enwiki.analytics.db.svc.wikimedia.cloud) and the alias of the database replica (enwiki_p):

$ mysql --defaults-file=$HOME/replica.my.cnf -h enwiki.analytics.db.svc.wikimedia.cloud enwiki_p

To connect to the Wikidata cluster:

$ mysql --defaults-file=$HOME/replica.my.cnf -h wikidatawiki.analytics.db.svc.wikimedia.cloud

To connect to Commons cluster:

$ mysql --defaults-file=$HOME/replica.my.cnf -h commonswiki.analytics.db.svc.wikimedia.cloud

There is also a shortcut for connecting to the replicas: sql <dbname>[_p] The _p is optional, but implicit (i.e. the sql tool will add it if absent).

To connect to the English Wikipedia database replica using the shortcut, simply type:

$ sql enwiki

To connect to ToolsDB where you can create and write to tables, type:

$ sql tools

This sets server to "tools.db.svc.wikimedia.cloud" and db to "". It's equivalent to typing-

$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud


Naming conventions

As a convenience, each mediawiki project database (enwiki, bgwiki, etc) has an alias to the cluster it is hosted on. The alias has the form:

${PROJECT}.{analytics,web}.db.svc.wikimedia.cloud

where ${PROJECT} is the internal database name of a hosted Wikimedia project. The choice of "analytics" or "web" is up to you. The analytics service name connects to Wiki Replica servers where SQL queries will be allowed to run for a longer duration, but at the cost of all queries being potentially slower. Use of the web service name should be reserved for webservices which are running queries that display to users.

Wikipedia project database names generally follow the format ${LANGUAGE_CODE}${PROJECT_FAMILY}. ${LANGUAGE_CODE} is the ISO 639 two-letter code for the primary content language (e.g. en for English, es for Spanish, bg for Bulgarian, ...). ${PROJECT_FAMILY} is an internal label for the wiki's project family (e.g. wiki for Wikipedia, wiktionary for Wiktionary, ...). Some wikis such as Meta-Wiki have database names that do not follow this pattern (metawiki).

The replica database names themselves consist of the Wikimedia project name, suffixed with _p (an underscore, and a p), for example:

enwiki_p for the English Wikipedia replica

In addition each cluster can be accessed by the name of its Wikimedia production shard which follows the format s${SHARD_NUMBER}.{analytics,web}.db.svc.wikimedia.cloud (for example, s1.analytics.db.svc.wikimedia.cloud hosts the enwiki_p database). The shard where a particular database is located can change over time. You should only use the shard name for opening a database connection if your application requires it for specific performance reasons such as for heavily crosswiki tools which would otherwise open hundreds of database connections.

Connection handling policy

Usage of connection pools (maintaining open connections without them being in use), persistent connections, or any kind of connection pattern that maintains several connections open even if they are unused is not permitted on shared MySQL instances (Wiki Replicas and ToolsDB).

The memory and processing power available to the database servers is a finite resource. Each open connection to a database, even if inactive, consumes some of these resources. Given the number of potential users for the Wiki Replicas and ToolsDB, if even a relatively small percentage of users held open idle connections, the server would quickly run out of resources to allow new connections. Please close your connections as soon as you stop using them. Note that connecting interactively and being idle for a few minutes is not an issue—opening dozens of connections and maintaining them automatically open is.

Idle connections can and will be killed by database and system administrators when discovered. If you (for example, by connector configuration or application policy) then reopen those connections automatically and keep them idle, you will be warned to stop.

Connecting to the wiki replicas from other Cloud VPS projects

The *.{analytics,web}.db.svc.wikimedia.cloud servers should be directly accessible from other Cloud VPS projects as well as Toolforge (these are provided in DNS), but there is no automatic creation of database credential files. The easiest way to get user credentials for use in another project is to create a Toolforge tool account and copy its credentials to your Cloud VPS instance.

Connecting to the database replicas from your own computer

You can access the database replicas from your own computer by setting up an SSH tunnel. If you use MySQL Workbench, you can find a detailed description for that application below.

Tunneling is a built-in capability of ssh. It allows creating a listening TCP port on your local computer that will transparently forward all connections to a given host and port on the remote side of the ssh connection. The destination host and port do not need to be the host that you are connecting to with your ssh session, but they do need to be reachable from the remote host.

In the general case, need to add a port forwarding in your ssh tool. Windows 10 has OpenSSH included and the ssh command can be used. On older versions of Windows, you can use the tool PuTTY by add in Connection → SSH → Tunnels the following settings.

In Linux or Windows 10, you can add the option -L $LOCAL_PORT:$REMOTE_HOST:$REMOTE_PORT to your ssh call, e. g.:

$ ssh -L 4711:enwiki.analytics.db.svc.wikimedia.cloud:3306 login.toolforge.org

This will set up a tunnel so that connections to port 4711 on your own computer will be relayed to the enwiki.analytics.db.svc.wikimedia.cloud database replica's MySQL server on port 3306. This tunnel will continue to work as long as the SSH session is open.

The mysql command line to connect using the tunnel from the example above would look something like:

$ mysql --user=$USER_FROM_REPLICA.MY.CNF --host=127.0.0.1 --port=4711 --password enwiki_p

The user and password values needed can be found in the $HOME/replica.my.cnf credentials file for your Toolforge user account or a tool that you have access to.

Note that you need to explicitly use the 127.0.0.1 IP address; using localhost instead will give an "Access denied" error.

SSH tunneling for local testing which makes use of Wiki Replica databases

  1. Setup SSH tunnels: ssh -N yourusername@dev.toolforge.org -L 3306:enwiki.analytics.db.svc.wikimedia.cloud:3306
    • -N prevents ssh from opening an interactive shell. This connection will only be useful for port forwarding.
    • The first port is the listening port on your machine and the second one is on the remote server. 3306 is the default port for MySQL.
    • For multiple database connections, add additional -L $LOCAL_PORT:$REMOTE_HOST:$REMOTE_PORT sections to the same command or open additional ssh connections.
    • If you need to connect to more than one Wiki Replica database server, each database will need a different listening port on your machine (e.g. 3307, 3308, 3309, ...). Change the associated php/python connect command to send requests to that port instead of the default 3306.
  2. (optional) Edit your /etc/hosts file to add something like 127.0.0.1 enwiki.analytics.db.svc.wikimedia.cloud for each of the databases you're connecting to.
  3. You might need to copy over the replica.my.cnf file to your local machine for this to work.

TLS connection failures

Some client libraries may attempt to enable TLS encryption when connecting to the Wiki Replica or ToolsDB databases. Depending on the backing server's configuration, this may either fail silently because TLS is not supported at all, or it may fail with authentication or decryption errors because TLS is partially enabled. In this second case, the problem is caused by MariaDB servers which do support TLS encryption but are using self-signed certificates which are not available to the client and do not match the service names used for connections from Cloud Services hosts.

The "fix" for these failures is to configure your client to avoid TLS encryption. How to do this will vary based on the client libraries in use, but should be something that you can find an answer for by searching the Internet/Stack Overflow/library documentation.

Databases

Replica database schema (tables and indexes)

The database replicas for the various Wikimedia projects follow the standard MediaWiki database schema described on mediawiki.org and in the MediaWiki git repository.

Many of the indexes on these tables are actually compound indexes designed to optimize the runtime performance of the MediaWiki software rather than to be convenient for ad hoc queries. For example, a naive query by page_title such at SELECT * FROM page WHERE page_title = 'NOFX'; will be slow because the index which includes page_title is a compound index with page_namespace. Adding page_namespace to the WHERE clause will improve the query speed dramatically: SELECT * FROM page WHERE page_namespace = 0 AND page_title = 'NOFX';

Stability of the mediawiki database schema

maintenance/tables-generated.sql shows the HEAD of the mediawiki changes. Extra tables may be available due to additional extensions setup in production. Also some tables may have been redacted or filtered for containing private data such as the user passwords or private ip addresses. Aside from that, while we try to synchronize production with development HEAD, changes to the database structure may be applied in advance (or more commonly) lag behind its publication. The reason for this is that schema changes are being continuously applied to production databases, and due to the amout of data, it may take a few hours to a few months (in the case of more complex cases) to be finalized.

Core tables, such as revision, page, user, recentchanges rarely change, but cloud maintainers cannot guarantee they will never change, as they have to follow the production changes. While we are happy for people to setup scripts and tools on top of the database copies (wikireplicas) expect the schema to change every now and then. If you cannot do small tweaks from time to time to adapt to the latest schema changes, using the API instead of the database internals is suggested, as API changes have more guarantees of stability and a proper lifecycle and deprecation policy. That is not true for mediawiki database internals, although compatibility views can sometimes be setup to require only minimal changes.

Tables for revision or logging queries involving user names and IDs

The revision and logging tables do not have indexes on user columns. In an email, one of the system administrators pointed out that this is because "those values are conditionally nulled when supressed" (see also phab:T68786 for some more detail). One has to instead use the corresponding revision_userindex or logging_userindex for these types of queries. On those views, rows where the column would have otherwise been nulled are elided; this allows the indexes to be usable.

Example query that will use the appropriate index (in this case on the rev_actor column)

SELECT rev_id, rev_timestamp FROM revision_userindex WHERE rev_actor=1234;

Example query that fails to use an index because the table doesn't have them:

SELECT rev_id, rev_timestamp FROM revision WHERE rev_actor=1234;

You should use the indexes so queries will go faster (performance).

Redacted tables

The majority of the user_properties table has been deemed sensitive and removed from the Wiki Replica databases. Only the disableemail, fancysig, gender, and nickname properties are available.

Unused tables

Some of the standard MediaWiki tables that are in use on Wikimedia wikis, are not available. The following tables are missing or empty:

Metadata database

There is a table with automatically maintained meta information about the replicated databases: meta_p.wiki. See Quarry #4031 for an up-to-date version.

The database host containing the meta_p database is: meta.analytics.db.svc.wikimedia.cloud.

MariaDB [meta_p]> DESCRIBE wiki;
+------------------+--------------+------+-----+---------+-------+
| Field            | Type         | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| dbname           | varchar(32)  | NO   | PRI | NULL    |       |
| lang             | varchar(12)  | NO   |     | en      |       |
| name             | text         | YES  |     | NULL    |       |
| family           | text         | YES  |     | NULL    |       |
| url              | text         | YES  |     | NULL    |       |
| size             | decimal(1,0) | NO   |     | 1       |       |
| slice            | text         | NO   |     | NULL    |       |
| is_closed        | decimal(1,0) | NO   |     | 0       |       |
| has_echo         | decimal(1,0) | NO   |     | 0       |       |
| has_flaggedrevs  | decimal(1,0) | NO   |     | 0       |       |
| has_visualeditor | decimal(1,0) | NO   |     | 0       |       |
| has_wikidata     | decimal(1,0) | NO   |     | 0       |       |
| is_sensitive     | decimal(1,0) | NO   |     | 0       |       |
+------------------+--------------+------+-----+---------+-------+

Example data:

MariaDB [meta_p]> select * from wiki limit 1 \G
*************************** 1. row ***************************
          dbname: aawiki
            lang: aa
            name: Wikipedia
          family: wikipedia
             url: https://aa.wikipedia.org
            size: 1
           slice: s3.labsdb
       is_closed: 1
        has_echo: 1
 has_flaggedrevs: 0
has_visualeditor: 1
    has_wikidata: 1
    is_sensitive: 0

Identifying lag

If there is a network/Wiki Replica db infrastructure problem, production problem, maintenance (scheduled or unscheduled), excessive load or production or user's queries blocking the replication process, the Wiki Replicas can "lag" behind the production databases.

To identify lag, see the replag tool or execute yourself on the database host you are connected to:

(u3518@enwiki.analytics.db.svc.wikimedia.cloud) [heartbeat_p]> SELECT * FROM heartbeat;
+-------+----------------------------+--------+
| shard | last_updated               | lag    |
+-------+----------------------------+--------+
| s1    | 2018-01-09T22:47:05.001180 | 0.0000 |
| s2    | 2018-01-09T22:47:05.001190 | 0.0000 |
| s3    | 2018-01-09T22:47:05.001290 | 0.0000 |
| s4    | 2018-01-09T22:47:05.000570 | 0.0000 |
| s5    | 2018-01-09T22:47:05.000670 | 0.0000 |
| s6    | 2018-01-09T22:47:05.000760 | 0.0000 |
| s7    | 2018-01-09T22:47:05.000690 | 0.0000 |
| s8    | 2018-01-09T22:47:05.000600 | 0.0000 |
+-------+----------------------------+--------+
8 rows in set (0.00 sec)

This table is based on the tool pt-heartbeat, not on SHOW MASTER STATUS, producing very accurate results, even if replication is broken, and directly comparing it to the original master, and not the replicas's direct master.

  • shard: s1-8. Each of the production masters. The wiki distribution can be seen at: https://noc.wikimedia.org/db.php
  • last_updated: Every 1 second, a row in the master is written with the date local to the master. Here you have its value, once replicated. As it is updated every 1 second, it has a measuring error of [0, 1+] seconds.
  • lag: The difference between the current date and the last_updated column (timestampdiff(MICROSECOND,`heartbeat`.`heartbeat`.`ts`,utc_timestamp())/1000000.0). Again note that updates to this table only happen every second (it can vary on production), so most decimals are meaningless.

To directly query the replication lag for a particular wiki, use requests like:

MariaDB [fawiki_p]> SELECT lag FROM heartbeat_p.heartbeat JOIN meta_p.wiki ON shard = SUBSTRING_INDEX(slice, ".", 1) WHERE dbname = 'fawiki';

+------+
| lag  |
+------+
|    0 |
+------+
1 row in set (0.09 sec)

Please note that some seconds or a few minutes of lag is considered normal, due to the filtering process and the hops done before reaching the public hosts.

User databases

User-created databases can be created on a shared server: tools.db.svc.wikimedia.cloud. Database names must start with the name of the credential user followed by two underscores and then the name of the database: <credentialUser>__<DBName> (e.g. "s51234__mydb").

The credential user is not your user name. It can be found in your $HOME/replica.my.cnf file. The name of the credential user looks something like 'u1234' for a user and 's51234' for a tool account. You can also find the name of the credential user using a live database connection:

SELECT SUBSTRING_INDEX(CURRENT_USER(), '@', 1);

Privileges on the database

Users have all privileges and have access to all grant options on their databases. Database names ending with _p are granted read access for everyone. Please create a ticket if you need more fine-grained permissions, like sharing a database only between 2 users, or other special permissions.

Steps to create a user database on tools.db.svc.wikimedia.cloud

To create a database on tools.db.svc.wikimedia.cloud:

  1. Become your tool account.
    maintainer@tools-login:~$ become toolaccount
  2. Connect to tools.db.svc.wikimedia.cloud with the replica.my.cnf credentials:
    mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud
    You could also just type:
    sql tools
  3. In the mysql console, create a new database (where CREDENTIALUSER is your credentials user, which can be found in your ~/replica.my.cnf file, and DBNAME the name you want to give to your database. Note that there are 2 underscores between CREDENTIALUSER and DBNAME):
    MariaDB [(none)]> CREATE DATABASE CREDENTIALUSER__DBNAME;

You can then connect to your database using:

$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud CREDENTIALUSER__DBNAME

Or:

$ sql tools
MariaDB [(none)]> USE CREDENTIALUSER__DBNAME;

Example

Assuming that your tool account is called "mytool", this is what it would look like:

$ maintainer@tools-login:~$ become mytool
$ tools.mytool@tools-login:~$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud
MariaDB [(none)]> select substring_index(current_user(), '@', 1) as uname;
+---------------+
| uname         |
+---------------+
| u123something |
+---------------+
1 row in set (0.00 sec)
MariaDB [(none)]> create database u123something__wiki;
Warning Caution: The legacy tools-db service name was deprecated in September 2017 and removed in May 2019. Use tools.db.svc.wikimedia.cloud instead.

Note: Some projects like python-Django can throw an exception like MySQLdb._exceptions.OperationalError: (1709, 'Index column size too large. The maximum column size is 767 bytes.') when migrated using the setup above. This can be fixed by altering the database charset to utf-8in most cases. To avoid this, create the database using the following command instead to specify the charset:

MariaDB [(none)]> CREATE DATABASE CREDENTIALUSER__DBNAME CHARACTER SET utf8;

ToolsDB Backups and Replication

We maintain two copies of the user and tool databases in ToolsDB, with a MySQL primary-replica setup. However, we don't do offline backups of any of the databases in ToolsDB. ToolsDB users can backup their data using mysqldump if necessary:

$ umask o-r # dump should not be public (unless the database is)
$ mysqldump --defaults-file=~/replica.my.cnf --host=tools.db.svc.wikimedia.cloud credentialUser__DBName > ~/DBname-$(date -I).sql
$ umask 0022 # restore default umask

Note that we don't recommend storing backups permanently on NFS (/data/project, /home, or /data/scratch on Toolforge) or on any other Cloud VPS hosted drive. True backups should be kept offsite.

There are some ToolsDB databases which are not replicated for various reasons including size and access patterns:

  • s51412__data
  • s51071__templatetiger_p
  • s52721__pagecount_stats_p
  • s51290__dpl_p

Users were warned about this particularity and details can be seen at task T127164.

Query Limits

This information is directed towards the new wikireplica services.

Since MariaDB 10.1 (the version used on tools and the new wikireplicas), one can use max_statement_time (unit is seconds, it allows decimals) :

SET max_statement_time = 300;

And all subsequent queries on the same connection will be killed if they run for longer than the given time.

For example:

mariadb[(none)]> SET max_statement_time = 10;
Query OK, 0 rows affected (0.00 sec)

mariadb[(none)]> SELECT sleep(20);
+-----------+
| sleep(20) |
+-----------+
|         1 |
+-----------+
1 row in set (10.00 sec)

It works on Quarry, too!

You can also set limits with a single SQL query. For example:

SET STATEMENT max_statement_time = 300 FOR
SELECT COUNT(rev_id) FROM revision_userindex
INNER JOIN actor
 ON rev_actor = actor_id
WHERE actor_name = 'Jimbo Wales'

Example queries

See Help:MySQL queries. Add yours!

Connecting with...

MySQL Workbench

Example configuration of MySQL Workbench for Toolforge

You can connect to databases on Toolforge with MySQL Workbench (or similar client applications) via an SSH tunnel.

Instructions for connecting via MySQL Workbench are as follows:

  1. Launch MySQL Workbench on your local machine.
  2. Click the plus icon next to "MySQL Connections" in the Workbench window (or choose "Manage Connections..." from the Database menu and click the "new" button).
  3. Set Connection Method to "Standard TCP/IP over SSH"
  4. Set the following connection parameters:
    • SSH Hostname: login.toolforge.org
    • SSH Username: <your Toolforge shell username>
    • SSH Key File: <your Toolforge SSH private key file>[1]
    • SSH Password: password/passphrase of your private key (if set) - not your wiki login password.
    • MySQL Hostname: enwiki.analytics.db.svc.wikimedia.cloud (or whatever server your database lives on)
    • MySQL Server Port: 3306
    • Username: <your Toolforge MySQL user name (from $HOME/replica.my.cnf)>
    • Password: <your Toolforge MySQL password (from $HOME/replica.my.cnf)>
    • Default Schema: <name of your Toolforge MySQL database, e.g. enwiki_p>
  5. Click "OK"

Replica-db hostnames can be found in /etc/hosts. Bear in mind to add the _p suffix if setting a default schema for replica databases. e.g: enwiki_p.

Ambox notice.png If you are using SSH keys generated with PuTTYgen (Windows users), you need to convert your private key to the 'OpenSSH' format. Load your private key in PuTTYgen, then click Conversions » Export OpenSSH key. Use this file as SSH Key File above.

Ambox notice.png If you are getting errors with SSL, you can try disabling it. From the menu bar: Database -> select your connection -> SSL -> Change "Use SSL" to "No".

PGAdmin (for OpenStreetMap databases)

This guide assumes you already know what PGAdmin is and how to use it.

Connecting to OSM via the official CLI PostgreSQL

  1. SSH to login.toolforge.org or dev.toolforge.org
  2. psql -h osmdb.eqiad.wmnet -U osm gis
  3. Start querying


Connecting from a Servlet in Tomcat

  1. create directory "lib" in directory "public_tomcat"
  2. copy "mysql-connector-java-bin.jar" to "public_tomcat/lib"
  3. import org.apache.tomcat.jdbc.pool.DataSource;
    import org.apache.tomcat.jdbc.pool.PoolProperties;
    
    String DBURL = "jdbc:mysql://tools.db.svc.wikimedia.cloud:3306/";
    String DBDRIVER = "com.mysql.jdbc.Driver";
    String DATABASE = DBUSER + "__" + PROJECT;
    
    PoolProperties p = new PoolProperties();
    p.setUrl (DBURL + DATABASE);
    p.setDriverClassName(DBDRIVER );
    p.setUsername (DBUSER );
    p.setPassword (DBPASSWORD );
    p.setJdbcInterceptors(
    	"org.apache.tomcat.jdbc.pool.interceptor.ConnectionState;" +
    	"org.apache.tomcat.jdbc.pool.interceptor.StatementFinalizer");
    DataSource datasource = new DataSource();
    datasource.setPoolProperties(p);
    Connection connection = datasource.getConnection ();
    Statement statement = connection.createStatement();
    
  4. javac -classpath javax.servlet.jar:tomcat-jdbc.jar myhttpservlet.java

Code samples for common languages

Copied with edits from mw:Toolserver:Database access#Program access (not all tested, use with caution!)

In most programming languages, it will be sufficient to tell MySQL to use the database credentials found in $HOME/.my.cnf assuming that you have created a symlink from $HOME/.my.cnf to $HOME/replica.my.cnf.

Below are various examples in a few common programming languages.

Bash

-- 2> /dev/null; date; echo '
/* Bash/SQL compatible test structure
 *
 * Run time: ? <SLOW_OK>
 */
SELECT 1
;-- ' | mysql -ch tools.db.svc.wikimedia.cloud enwiki_p > ~/query_results-enwiki; date;

C

#include <my_global.h>
#include <mysql.h>

...

 char *host = "tools.db.svc.wikimedia.cloud";
 MYSQL *conn = mysql_init(NULL);

 mysql_options(conn, MYSQL_READ_DEFAULT_GROUP, "client");
 if (mysql_real_connect(conn, host, NULL, NULL, NULL, 0, NULL, 0) == NULL) {
    printf("Error %u: %s\n", mysql_errno(conn), mysql_error(conn));
    ...
 }

Perl

use User::pwent;
use DBI;

my $database = "enwiki_p";
my $host = "tools.db.svc.wikimedia.cloud";

my $dbh = DBI->connect(
    "DBI:mysql:database=$database;host=$host;"
    . "mysql_read_default_file=" . getpwuid($<)->dir . "/replica.my.cnf",
    undef, undef) or die "Error: $DBI::err, $DBI::errstr";

Python

Without installing the toolforge library, this will work:

import pymysql
database = "enwiki_p"
conn = pymysql.connections.Connection(user=user, password=password, database=wpDatabase, host='enwiki.analytics.db.svc.wikimedia.cloud')
with conn.cursor() as cur:
    cur.execute(query)  # Or something....

Using User:Legoktm/toolforge library, however, is probably the easiest way. This wrapper library supports both Python 3 and legacy Python 2 applications and provides convenience functions for connecting to the Wiki Replica databases.

import toolforge
conn = toolforge.connect('enwiki') # You can also use "enwiki_p"
# conn is a pymysql.connection object.
with conn.cursor() as cur:
    cur.execute(query)  # Or something....

We used to recommend oursql as well, but as of 2019-02-20 it seems to be abandoned or at least not actively maintained and failing to compile against MariaDB client libraries.

Python: Django

If you are using Django, first install install mysqlclient:

pip install mysqlclient

Then insert the database in the settings.py file as following, with s12345 your user name:

    import configparser
    import os

    HOME=os.environ.get('HOME') #get environment variable $HOME

    replica_path=HOME + '/replica.my.cnf'
    if os.path.exists(replica_path):          #check that the file is found
        config = configparser.ConfigParser()
        config.read(replica_path)
    else:
        print('replica.my.cnf file not found')

    DATABASES = {
    'default': {
         'ENGINE': 'django.db.backends.mysql',
         'NAME': 's12345__mydbname',                                 
         'USER': config['client']['user'],                          #for instance "s12345"
         'PASSWORD': config['client']['password'],
         'HOST': 'tools.db.svc.wikimedia.cloud',
         'PORT': '',
     }
    }

PHP (using PDO)

<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$db = new PDO("mysql:host=enwiki.analytics.db.svc.wikimedia.cloud;dbname=enwiki_p", $ts_mycnf['user'], $ts_mycnf['password']);
unset($ts_mycnf, $ts_pw);

$q = $db->prepare('select * from page where page_id = :id');
$q->execute(array(':id' => 843020));
print_r($q->fetchAll());
?>

PHP (using MySQLi)

<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$mysqli = new mysqli('enwiki.analytics.db.svc.wikimedia.cloud', $ts_mycnf['user'], $ts_mycnf['password'], 'enwiki_p');
unset($ts_mycnf, $ts_pw);

$stmt = $mysqli->prepare('select * from page where page_id = ?');
$id = 843020;
$stmt->bind_param('i', $id);
$stmt->execute();
$result = $stmt->get_result();
print_r($result->fetch_all(MYSQLI_BOTH));
?>

Java

Class.forName("com.mysql.jdbc.Driver").newInstance();
Properties mycnf = new Properties();
mycnf.load(new FileInputStream(System.getProperty("user.home")+"/replica.my.cnf"));
String password = mycnf.getProperty("password");
password=password.substring((password.startsWith("\""))?1:0, password.length()-((password.startsWith("\""))?1:0));
mycnf.put("password", password);
mycnf.put("useOldUTF8Behavior", "true");
mycnf.put("useUnicode", "true");
mycnf.put("characterEncoding", "UTF-8");
mycnf.put("connectionCollation", "utf8_general_ci");
String url = "jdbc:mysql://tools.db.svc.wikimedia.cloud:3306/enwiki_p";
Connection conn = DriverManager.getConnection(url, mycnf);

Node.js

mysql2 client provides a promise-based interface.

const mysql = require('mysql2/promise');
async function sample() {
  const connection = await mysql.createConnection({
    host: 'tools.db.svc.wikimedia.cloud', 
    port: 3306,
    database: 's12345__mydbname', 
    user: 's12345', 
    password: ''
  });
  const [rows, fields] = await connection.execute('SELECT * FROM table WHERE name = ? AND age > ?', ['Morty', 14]);
  for (let row in rows) console.log(row);
}


See also

Note

  1. If your private key is in a RFC4716 format, you will have to convert it to a PEM key.