You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Query killer"

From Wikitech-static
Jump to navigation Jump to search
imported>Nemo bis
(copy from bugzilla:58157#c5)
 
imported>Legoktm
(may not work properly under high load, link to db-kill for emergency manual intervention)
 
Line 1: Line 1:
The '''query killer''' on production databases is a 60s pt-kill job on slaves after query spikes causing outages in November 2013. When the number of slow queries on a [[Slave database server|slave]] grows beyond a threshold the slowest one above 60s is sniped to keep the box alive. This only affects wikiuser, not wikiadmin.
The '''query killer''' on production databases is supposed to stop queries on replicas that take more than 60 seconds. It is implemented in native MySQL, see [[git:operations/software/+/refs/heads/master/dbtools/events_coredb_slave.sql|events_coredb_slave.sql]]. Under high load, it may not function properly, see [[db-kill]] for how to kill slow queries manually in emergencies.  


Historically this was implemented as a 60s pt-kill job, when the number of slow queries on a replica grows beyond a threshold the slowest one above 60s is sniped to keep the box alive. It was introduced after query spikes caused outages in November 2013.
== Statistics ==
Each killed query is recorded in the <code>ops.event_log</code> table. Events are removed after 24 hours.<pre>
MariaDB [(none)]> use ops
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [ops]> select * from event_log where event like "wmf_slave_wikiuser_slow%" limit 5;
+-----------+---------------------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| server_id | stamp              | event                        | content                                                                                                              |
+-----------+---------------------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 171974878 | 2021-10-14 20:49:03 | wmf_slave_wikiuser_slow (>60) | kill 2459844921; SELECT /* SpecialRecentChangesLinked::doMainQuery  */  rc_id,rc_timestamp,rc_namespace,rc_title,rc_m |
| 171974878 | 2021-10-14 20:49:03 | wmf_slave_wikiuser_slow (>60) | kill 2459844921; SELECT /* SpecialRecentChangesLinked::doMainQuery  */  rc_id,rc_timestamp,rc_namespace,rc_title,rc_m |
| 171974878 | 2021-10-14 20:50:03 | wmf_slave_wikiuser_slow (>60) | kill 2459862622; SELECT /* SpecialRecentChanges::doMainQuery  */  /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespac |
| 171974878 | 2021-10-14 20:51:03 | wmf_slave_wikiuser_slow (>60) | kill 2459882167; SELECT /* SpecialRecentChanges::doMainQuery  */  /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespac |
| 171974878 | 2021-10-14 20:52:03 | wmf_slave_wikiuser_slow (>60) | kill 2459901127; SELECT /* SpecialRecentChanges::doMainQuery  */  /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespac |
+-----------+---------------------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.000 sec)
</pre>
[[Category:MySQL]]
[[Category:MySQL]]

Latest revision as of 20:11, 15 October 2021

The query killer on production databases is supposed to stop queries on replicas that take more than 60 seconds. It is implemented in native MySQL, see events_coredb_slave.sql. Under high load, it may not function properly, see db-kill for how to kill slow queries manually in emergencies.

Historically this was implemented as a 60s pt-kill job, when the number of slow queries on a replica grows beyond a threshold the slowest one above 60s is sniped to keep the box alive. It was introduced after query spikes caused outages in November 2013.

Statistics

Each killed query is recorded in the ops.event_log table. Events are removed after 24 hours.

MariaDB [(none)]> use ops
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [ops]> select * from event_log where event like "wmf_slave_wikiuser_slow%" limit 5;
+-----------+---------------------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| server_id | stamp               | event                         | content                                                                                                               |
+-----------+---------------------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 171974878 | 2021-10-14 20:49:03 | wmf_slave_wikiuser_slow (>60) | kill 2459844921; SELECT /* SpecialRecentChangesLinked::doMainQuery  */  rc_id,rc_timestamp,rc_namespace,rc_title,rc_m |
| 171974878 | 2021-10-14 20:49:03 | wmf_slave_wikiuser_slow (>60) | kill 2459844921; SELECT /* SpecialRecentChangesLinked::doMainQuery  */  rc_id,rc_timestamp,rc_namespace,rc_title,rc_m |
| 171974878 | 2021-10-14 20:50:03 | wmf_slave_wikiuser_slow (>60) | kill 2459862622; SELECT /* SpecialRecentChanges::doMainQuery  */  /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespac |
| 171974878 | 2021-10-14 20:51:03 | wmf_slave_wikiuser_slow (>60) | kill 2459882167; SELECT /* SpecialRecentChanges::doMainQuery  */  /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespac |
| 171974878 | 2021-10-14 20:52:03 | wmf_slave_wikiuser_slow (>60) | kill 2459901127; SELECT /* SpecialRecentChanges::doMainQuery  */  /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespac |
+-----------+---------------------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.000 sec)