You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Db-kill

From Wikitech-static
Revision as of 10:37, 13 October 2021 by imported>Jcrespo (→‎Usage: where it is installed)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

db-kill is an emergency-only tool, that simplifies the killing of long running queries on a host. It kills (forever) queries running for over 10 seconds or sleeping over 10 seconds on the given mysql instance.

Details

db-kill is intended as an easy-to-use wrapper for pt-kill, with minimal options, for operators that are not too familiar with MariaDB administration.

Rather than connecting to a host, running SHOW PROCESSLIST and doing a bunch of complicated database commands, it tries to simplify by having some good defaults -when an emergency kill run is needed.

db-kill does:

  • kill only webrequests/jobqueue, with the use of a specific user (doesn't kill admin requests, dumps, root, etc)
  • set sane defaults for interval, busy-time and idle-time, and killing running or sleeping queries in pt-kill
  • make easy to run against any instance, allowing using hostname:section (e.g. db1099:s1)
  • print the full commandline for pt-kill, in case manual adjustments are needed
  • log all actions to a file for later reviewing
  • have a --dry-run option to monitor effects without actual kills
  • use automatically the "extra port", in case the default one is overloaded by other queries.

Usage

db-kill must be run as root from one of the cumin hosts, ideally from a local datacenter (e.g. cumin1001.eqiad.wment for eqiad databases):

To kill long running queries on the s2 standalone mariadb instance db2059.codfw.wmnet:

db-kill db2059

If one wants to kill queries from the enwiki instance on the multi-instance host db1099.eqiad.wmnet:

db-kill db1099:s1

Alternatively, the following will be equivalent to the previous execution (s1 runs on port 3311):

db-kill db1099:3311

To test what will be killed on the standalone host db1117.eqiad.wmnet, without actually killing anything:

db-kill --dry-run db1117

Output:

Running:
/usr/bin/pt-kill --print --victims all --interval 5 --busy-time 10 --idle-time 10 --match-command 'Query|Execute|Sleep' --match-user wikiuser --log /var/log/db-kill.log h=db1117.eqiad.wmnet,P=3307
# 2021-10-13T10:27:42 KILL 2816313971 (Query 10 sec) SELECT /* MediaWiki\Revision\RevisionStore::loadSlotRecords  */  slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `slots` JOIN `content` ON ((slot_content_id = content_id))   WHERE slot_revision_id = '4219499'
# 2021-10-13T10:27:42 KILL 2816313958 (Query 10 sec) SELECT /* ProofreadPage\Page\DatabaseIndexForPageLookup::findIndexesWhichLinkTo  */  page_namespace,page_title  FROM `page`,`pagelinks`    WHERE pl_namespace = 250 AND pl_title = 'Ксьондзівські_найми.pdf/8' AND (pl_from=page_id) AND pl_from_namespace = 252
# 2021-10-13T10:27:42 KILL 2816312509 (Query 10 sec) SELECT /* MediaWiki\Permissions\RestrictionStore::loadRestrictions  */  pr_type,pr_expiry,pr_level,pr_cascade  FROM `page_restrictions`    WHERE pr_page = 3823
...

Hit Ctrl-C to finish the monitoring and killing process- otherwise it will continue forever.

Arguments

db-kill --help
usage: db-kill [-h] [--dry-run] instance

Execute pt-kill on the given instance.

positional arguments:
  instance    Instance to connect to, in hostname, host:port or host:section
              format

optional arguments:
  -h, --help  show this help message and exit
  --dry-run   It prints the queries that would be killed, but does not send
              the kill commands

Code