Difference between revisions of "Db-kill"
(fix display title)
m (how it complements query killer)
|Line 1:||Line 1:|
<code>db-kill</code> is an emergency-only tool, that simplifies the killing of long running queries on a host. It kills (
<code>db-kill</code> is an emergency-only tool, that simplifies the killing of long running queries on a host. It kills () queries running for over 10 seconds or sleeping over 10 seconds on the given mysql instance. It is intended to complement the automatic [[query killer]] that kicks in at 60seconds .
== Details ==
== Details ==
Latest revision as of 08:53, 16 October 2021
db-kill is an emergency-only tool, that simplifies the killing of long running queries on a host. It kills (until manually stopped) queries running for over 10 seconds or sleeping over 10 seconds on the given mysql instance. It is intended to complement the automatic query killer that kicks in at 60+ seconds and only kills queries longer than 60 seconds, when that is not enough to contain a query overload.
db-kill is intended as an easy-to-use wrapper for pt-kill, with minimal options, for operators that are not too familiar with MariaDB administration.
Rather than connecting to a host, running SHOW PROCESSLIST and doing a bunch of complicated database commands, it tries to simplify by having some good defaults -when an emergency kill run is needed.
- kill only webrequests/jobqueue, with the use of a specific user (doesn't kill admin requests, dumps, root, etc)
- set sane defaults for interval, busy-time and idle-time, and killing running or sleeping queries in pt-kill
- make easy to run against any instance, allowing using hostname:section (e.g. db1099:s1)
- print the full commandline for pt-kill, in case manual adjustments are needed
- log all actions to a file for later reviewing
- have a
--dry-runoption to monitor effects without actual kills
- use automatically the "extra port", in case the default one is overloaded by other queries.
db-kill must be run as root from one of the cumin hosts, ideally from a local datacenter (e.g.
cumin1001.eqiad.wment for eqiad databases):
To kill long running queries on the s2 standalone mariadb instance db2059.codfw.wmnet:
If one wants to kill queries from the enwiki instance on the multi-instance host db1099.eqiad.wmnet:
Alternatively, the following will be equivalent to the previous execution (s1 runs on port 3311):
To test what will be killed on the standalone host db1117.eqiad.wmnet, without actually killing anything:
db-kill --dry-run db1117
Running: /usr/bin/pt-kill --print --victims all --interval 5 --busy-time 10 --idle-time 10 --match-command 'Query|Execute|Sleep' --match-user wikiuser --log /var/log/db-kill.log h=db1117.eqiad.wmnet,P=3307 # 2021-10-13T10:27:42 KILL 2816313971 (Query 10 sec) SELECT /* MediaWiki\Revision\RevisionStore::loadSlotRecords */ slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model FROM `slots` JOIN `content` ON ((slot_content_id = content_id)) WHERE slot_revision_id = '4219499' # 2021-10-13T10:27:42 KILL 2816313958 (Query 10 sec) SELECT /* ProofreadPage\Page\DatabaseIndexForPageLookup::findIndexesWhichLinkTo */ page_namespace,page_title FROM `page`,`pagelinks` WHERE pl_namespace = 250 AND pl_title = 'Ксьондзівські_найми.pdf/8' AND (pl_from=page_id) AND pl_from_namespace = 252 # 2021-10-13T10:27:42 KILL 2816312509 (Query 10 sec) SELECT /* MediaWiki\Permissions\RestrictionStore::loadRestrictions */ pr_type,pr_expiry,pr_level,pr_cascade FROM `page_restrictions` WHERE pr_page = 3823 ...
Ctrl-C to finish the monitoring and killing process- otherwise it will continue forever.
db-kill --help usage: db-kill [-h] [--dry-run] instance Execute pt-kill on the given instance. positional arguments: instance Instance to connect to, in hostname, host:port or host:section format optional arguments: -h, --help show this help message and exit --dry-run It prints the queries that would be killed, but does not send the kill commands