You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
MariaDB/troubleshooting
< MariaDB
Jump to navigation
Jump to search
Revision as of 18:14, 28 October 2015 by imported>Jcrespo
This guide is a work in progress.
Depooling a slave
Quick depooling, for emergencies only
Connect to the offending server. Log in to mysql (sudo mysql --defaults-file=/root/.my.cnf
), and execute the following command:
STOP SLAVE;
If you stop replication (assuming replication is working normally for other hosts), mediawiki will depool the slave automatically after X seconds. This is not the preferred method for several reasons:
- The depool is not immediate, during that time, the server keeps receiving queries, which may be undesirable in many cases
- The depool is not total, the server may receive queries still if all servers are lagged
The only advantage is that it does not require a deploy to depool the server. Use the regular depooling method in most cases.
Normal depooling
Comment its server name from db-eqiad.php
or db-codfw.php
.
- Hopefully, you will have the operations/mediawiki-config repository already configured
- Edit the file wmf-config/db-eqiad.php (or wmf-config/db-codfw.php, if it is a server in dallas) and comment the line of the normal weight, in the 'sectionLoads' array, like this:
's8' => array( 'db1045' => 0, # 1.4TB 64GB # depooled because reasons 'db2094' => 100, 'db2095' => 100, 'db2096' => 100, 'db2097' => 100, ),
- Do not just set the weight to 0, that will not fully depool the server
- If the server has a particular role (normally commented on the same line), like vslow, dump. recentchanges, etc., you must take away the roles and assign them to another node in the 'groupLoadsBySection' array:
's8' => array( 'watchlist' => array( # 'db2094' => 1, 'db2055' => 1, ), 'recentchanges' => array( # 'db2094' => 1, 'db2055' => 1, ), 'recentchangeslinked' => array( # 'db2094' => 1, 'db2055' => 1, ),
- Most roles can be taken by any slave (assuming the have enough resources and not overloaded), although there are some cases where there are only a couple of "special slaves" optimized for a particular role.
- commit the change and deploy on tin:
/srg/mediawiki-staging $ sync-file wmf-config/db-eqiad.php "Depooled because reasons"
- Monitor that mysql connections to that host slowly drop as soon as the queries finish. For that, connect to that host and use mysql's SHOW PROCESSLIST; and check there are no wikiuser or wikiadmin connctions. You can also monitor connections with regular linux tools like netstat/ss on port 3306. Monitoring tools regularly check the host, but they use separate users.
Example:
MariaDB PRODUCTION x1 localhost (none) > SHOW PROCESSLIST; +---------+-----------------+-------------------+--------------------+---------+- | Id | User | Host | db | Command | +---------+-----------------+-------------------+--------------------+---------+- # internal process, ignore | 2 | event_scheduler | localhost | NULL | Daemon | # replication users, ignore | 3192579 | system user | | NULL | Connect | | 3192580 | system user | | NULL | Connect | # monitoring users, ignore | 6284249 | watchdog | 10.XX.XX.XX:34525 | information_schema | Sleep | | 6284250 | watchdog | 10.XX.XX.XX:34716 | information_schema | Sleep | | 6284253 | watchdog | 10.XX.XX.XX:34890 | mysql | Sleep | # this is your own connection | 6311084 | root | localhost | NULL | Query | +---------+-----------------+-------------------+--------------------+---------+-
(no wikiuser or wikiadmin processes, ok to do maintenance, kill the machine, etc.)
- Except in the case of a problem or the dump slave, while creating the dumps, connections should go away in seconds/very few minutes. If there is an emergency- killing the process (
KILL <#ID>
) is the way to go. Selects are ok to kill, writes and alters can create worse issues due to rollback process kicking in- be sure what you kill.
- Except in the case of a problem or the dump slave, while creating the dumps, connections should go away in seconds/very few minutes. If there is an emergency- killing the process (