You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(Amir1: sudo service mailman3-web restart)
imported>Stashbot
(legoktm: regenerating pipermail redirects to skip those with duplicate message-ids (T280731))
(39 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 2021-06-17 ==
* 21:49 legoktm: regenerating pipermail redirects to skip those with duplicate message-ids ([[phab:T280731|T280731]])
* 18:24 ryankemper: [[phab:T285106|T285106]] [WDQS] `ryankemper@wdqs2001:~$ sudo depool`
* 18:01 dancy: Deployed latest scap code to beta cluster
* 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase/client/includes/ClientHooks.php: Backport: [[gerrit:700036{{!}}client: Bring back using the client setting for langlink group (T284854)]] (duration: 00m 58s)
* 13:28 jbond: add prometheus-jmx-exporter to bullseye-wikimedia
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16604 and previous config saved to /var/cache/conftool/dbconfig/20210617-121146-root.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16603 and previous config saved to /var/cache/conftool/dbconfig/20210617-120109-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16602 and previous config saved to /var/cache/conftool/dbconfig/20210617-115643-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16601 and previous config saved to /var/cache/conftool/dbconfig/20210617-115319-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16600 and previous config saved to /var/cache/conftool/dbconfig/20210617-114605-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16599 and previous config saved to /var/cache/conftool/dbconfig/20210617-114139-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16598 and previous config saved to /var/cache/conftool/dbconfig/20210617-113816-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16597 and previous config saved to /var/cache/conftool/dbconfig/20210617-113101-root.json
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16596 and previous config saved to /var/cache/conftool/dbconfig/20210617-112635-root.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16595 and previous config saved to /var/cache/conftool/dbconfig/20210617-112431-marostegui.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16594 and previous config saved to /var/cache/conftool/dbconfig/20210617-112312-root.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16593 and previous config saved to /var/cache/conftool/dbconfig/20210617-111558-root.json
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16592 and previous config saved to /var/cache/conftool/dbconfig/20210617-111026-marostegui.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16591 and previous config saved to /var/cache/conftool/dbconfig/20210617-110808-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16590 and previous config saved to /var/cache/conftool/dbconfig/20210617-110656-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16589 and previous config saved to /var/cache/conftool/dbconfig/20210617-110200-marostegui.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16588 and previous config saved to /var/cache/conftool/dbconfig/20210617-105153-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16587 and previous config saved to /var/cache/conftool/dbconfig/20210617-103649-root.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16586 and previous config saved to /var/cache/conftool/dbconfig/20210617-102145-root.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P16585 and previous config saved to /var/cache/conftool/dbconfig/20210617-101827-marostegui.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16584 and previous config saved to /var/cache/conftool/dbconfig/20210617-100445-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16583 and previous config saved to /var/cache/conftool/dbconfig/20210617-094942-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16582 and previous config saved to /var/cache/conftool/dbconfig/20210617-093438-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16581 and previous config saved to /var/cache/conftool/dbconfig/20210617-092056-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16580 and previous config saved to /var/cache/conftool/dbconfig/20210617-091934-root.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P16579 and previous config saved to /var/cache/conftool/dbconfig/20210617-090947-marostegui.json
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16578 and previous config saved to /var/cache/conftool/dbconfig/20210617-090552-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16577 and previous config saved to /var/cache/conftool/dbconfig/20210617-085048-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16576 and previous config saved to /var/cache/conftool/dbconfig/20210617-084941-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16575 and previous config saved to /var/cache/conftool/dbconfig/20210617-083545-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16574 and previous config saved to /var/cache/conftool/dbconfig/20210617-083438-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16573 and previous config saved to /var/cache/conftool/dbconfig/20210617-083005-marostegui.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16572 and previous config saved to /var/cache/conftool/dbconfig/20210617-082939-marostegui.json
* 08:28 elukey: upload istioctl 1.6.14-1 to buster-wikimedia
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16571 and previous config saved to /var/cache/conftool/dbconfig/20210617-082437-root.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16570 and previous config saved to /var/cache/conftool/dbconfig/20210617-082409-marostegui.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16569 and previous config saved to /var/cache/conftool/dbconfig/20210617-081934-root.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16568 and previous config saved to /var/cache/conftool/dbconfig/20210617-080933-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16567 and previous config saved to /var/cache/conftool/dbconfig/20210617-080430-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16566 and previous config saved to /var/cache/conftool/dbconfig/20210617-075825-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16565 and previous config saved to /var/cache/conftool/dbconfig/20210617-075429-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16564 and previous config saved to /var/cache/conftool/dbconfig/20210617-073926-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16563 and previous config saved to /var/cache/conftool/dbconfig/20210617-073305-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16562 and previous config saved to /var/cache/conftool/dbconfig/20210617-073229-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16561 and previous config saved to /var/cache/conftool/dbconfig/20210617-071726-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16560 and previous config saved to /var/cache/conftool/dbconfig/20210617-070222-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16559 and previous config saved to /var/cache/conftool/dbconfig/20210617-064717-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16558 and previous config saved to /var/cache/conftool/dbconfig/20210617-063135-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16557 and previous config saved to /var/cache/conftool/dbconfig/20210617-062514-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16556 and previous config saved to /var/cache/conftool/dbconfig/20210617-061010-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16555 and previous config saved to /var/cache/conftool/dbconfig/20210617-055507-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16554 and previous config saved to /var/cache/conftool/dbconfig/20210617-054003-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16553 and previous config saved to /var/cache/conftool/dbconfig/20210617-053455-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16552 and previous config saved to /var/cache/conftool/dbconfig/20210617-053105-root.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16551 and previous config saved to /var/cache/conftool/dbconfig/20210617-051601-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16550 and previous config saved to /var/cache/conftool/dbconfig/20210617-050057-root.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16549 and previous config saved to /var/cache/conftool/dbconfig/20210617-044554-root.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16548 and previous config saved to /var/cache/conftool/dbconfig/20210617-044146-marostegui.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16547 and previous config saved to /var/cache/conftool/dbconfig/20210617-044132-marostegui.json
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16546 and previous config saved to /var/cache/conftool/dbconfig/20210617-043130-marostegui.json
== 2021-06-16 ==
* 21:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:32 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 17:41 dancy: Reverted Scap release on beta
* 16:18 topranks: Resetting metric on Telia CCT IC-331929, cr1-codfw and cr3-eqsin.
* 15:22 dancy: testing upcoming Scap release on beta
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16545 and previous config saved to /var/cache/conftool/dbconfig/20210616-125329-root.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16544 and previous config saved to /var/cache/conftool/dbconfig/20210616-123826-root.json
* 12:34 kormat: deploying heartbeat service puppet change
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16543 and previous config saved to /var/cache/conftool/dbconfig/20210616-122322-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16541 and previous config saved to /var/cache/conftool/dbconfig/20210616-120818-root.json
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 12:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16540 and previous config saved to /var/cache/conftool/dbconfig/20210616-120015-marostegui.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16539 and previous config saved to /var/cache/conftool/dbconfig/20210616-112115-root.json
* 11:20 hnowlan: running `nodetool cleanup` on maps1005
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16538 and previous config saved to /var/cache/conftool/dbconfig/20210616-110612-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16537 and previous config saved to /var/cache/conftool/dbconfig/20210616-105108-root.json
* 10:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16536 and previous config saved to /var/cache/conftool/dbconfig/20210616-103604-root.json
* 10:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16535 and previous config saved to /var/cache/conftool/dbconfig/20210616-102349-marostegui.json
* 09:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 09:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 09:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 09:50 hnowlan: disabling puppet on maps1* to reparent maps1007 from new master maps1009
* 09:47 kormat: truncating all pc* tables on pc1010 [[phab:T282761|T282761]]
* 09:40 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1009 as pc3 primary [[phab:T282761|T282761]] (duration: 00m 59s)
* 09:04 kormat: Deploying wmfmariadbpy 0.7.1 [[phab:T284819|T284819]]
* 09:04 kormat: uploaded wmfmariadbpy 0.7.1 to apt.wm.o
* 08:24 Amir1: running "update flaggedrevs set fr_quality = 0 where fr_quality != 0;" on all wikis where flagged revs is enabled ([[phab:T279761|T279761]])
* 07:27 dcausse: cleanup old /var/log/airflow/scheduler logs to reclaim space on an-airflow1001
* 06:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:52 volans@cumin1001: START - Cookbook sre.dns.netbox
* 05:06 marostegui: Upgrade clouddb1014
== 2021-06-15 ==
* 17:54 dancy: testing upcoming Scap release on beta
* 17:21 mutante: new Wikimedia language "shi" added - Shilha /ˈʃɪlhə/ is a Berber language native to Shilha people. The endonym is Taclḥit /taʃlʜijt/, and in recent English publications the language is often rendered Tashelhiyt or Tashelhit.
* 17:17 mutante: new Wikimedia language "dag" added - Dagbani (or Dagbane), also known as Dagbanli and Dagbanle, is a Gur language spoken in Ghana.
* 17:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
* 17:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
* 16:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
* 16:11 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
* 14:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:25 XioNoX: re-enable cr1-codfw:xe-5/1/2
* 13:23 marostegui: Upgrade clouddb1018
* 13:15 effie: enable puppet on canaries
* 13:10 effie: disable puppet on canaries to deploy 699908
* 10:45 XioNoX: re-enable cr1-codfw:xe-5/1/2
* 09:42 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P16533 and previous config saved to /var/cache/conftool/dbconfig/20210615-092511-marostegui.json
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318, db2082', diff saved to https://phabricator.wikimedia.org/P16532 and previous config saved to /var/cache/conftool/dbconfig/20210615-092409-marostegui.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P16531 and previous config saved to /var/cache/conftool/dbconfig/20210615-090802-marostegui.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2083', diff saved to https://phabricator.wikimedia.org/P16530 and previous config saved to /var/cache/conftool/dbconfig/20210615-090650-marostegui.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2084', diff saved to https://phabricator.wikimedia.org/P16529 and previous config saved to /var/cache/conftool/dbconfig/20210615-090243-marostegui.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2081', diff saved to https://phabricator.wikimedia.org/P16528 and previous config saved to /var/cache/conftool/dbconfig/20210615-090206-marostegui.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P16527 and previous config saved to /var/cache/conftool/dbconfig/20210615-085953-marostegui.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P16526 and previous config saved to /var/cache/conftool/dbconfig/20210615-085938-marostegui.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 db2083 db2084 db2091', diff saved to https://phabricator.wikimedia.org/P16525 and previous config saved to /var/cache/conftool/dbconfig/20210615-083233-marostegui.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P16524 and previous config saved to /var/cache/conftool/dbconfig/20210615-082857-marostegui.json
* 06:10 XioNoX: roll OSPF link-protection to all routers - [[phab:T167306|T167306]]
* 02:30 eileen: civicrm revision changed from {{Gerrit|d9d61dad0b}} to {{Gerrit|acbcce94a2}}, config revision is {{Gerrit|2aed6ff89b}}
* 01:22 eileen: civicrm revision changed from {{Gerrit|28ace1b86f}} to {{Gerrit|d9d61dad0b}}, config revision is {{Gerrit|2aed6ff89b}}
* 00:37 eileen: civicrm revision changed from {{Gerrit|31d07115a0}} to {{Gerrit|28ace1b86f}}, config revision is {{Gerrit|2aed6ff89b}}
== 2021-06-14 ==
* 21:40 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@baeee47]: [[phab:T261407|T261407]] bulk_daemon: Deploy prioritized topics (duration: 00m 49s)
* 21:40 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@baeee47]: [[phab:T261407|T261407]] bulk_daemon: Deploy prioritized topics
* 19:27 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1003.eqiad.wmnet
* 19:21 twentyafterfour_: applying hotfix for [[phab:T284397|T284397]] and restarting php7.3-fpm on phab1001
* 18:30 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1003.eqiad.wmnet
* 17:05 jforrester@deploy1002: Finished deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]] (duration: 00m 07s)
* 17:05 jforrester@deploy1002: Started deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]]
* 16:46 jforrester@deploy1002: Finished deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]] (duration: 00m 07s)
* 16:46 jforrester@deploy1002: Started deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]]
* 15:56 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1002.eqiad.wmnet
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16521 and previous config saved to /var/cache/conftool/dbconfig/20210614-155258-root.json
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16520 and previous config saved to /var/cache/conftool/dbconfig/20210614-153754-root.json
* 15:24 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16519 and previous config saved to /var/cache/conftool/dbconfig/20210614-152250-root.json
* 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1005.eqiad.wmnet
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16518 and previous config saved to /var/cache/conftool/dbconfig/20210614-150747-root.json
* 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1005.eqiad.wmnet
* 15:04 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1002.eqiad.wmnet
* 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1004.eqiad.wmnet
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1004.eqiad.wmnet
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16517 and previous config saved to /var/cache/conftool/dbconfig/20210614-145243-root.json
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1003.eqiad.wmnet
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16516 and previous config saved to /var/cache/conftool/dbconfig/20210614-145039-root.json
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1003.eqiad.wmnet
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16515 and previous config saved to /var/cache/conftool/dbconfig/20210614-144130-marostegui.json
* 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1002.eqiad.wmnet
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16514 and previous config saved to /var/cache/conftool/dbconfig/20210614-143536-root.json
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1002.eqiad.wmnet
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16513 and previous config saved to /var/cache/conftool/dbconfig/20210614-143224-root.json
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16512 and previous config saved to /var/cache/conftool/dbconfig/20210614-143211-root.json
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1001.eqiad.wmnet
* 14:27 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice<nowiki>{</nowiki>BannerHistory,Impression<nowiki>}</nowiki> to EventGate on all wikis - [[phab:T271168|T271168]] (duration: 00m 57s)
* 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1001.eqiad.wmnet
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2007.codfw.wmnet
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16511 and previous config saved to /var/cache/conftool/dbconfig/20210614-142032-root.json
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16510 and previous config saved to /var/cache/conftool/dbconfig/20210614-142014-root.json
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16509 and previous config saved to /var/cache/conftool/dbconfig/20210614-141720-root.json
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16508 and previous config saved to /var/cache/conftool/dbconfig/20210614-141707-root.json
* 14:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice<nowiki>{</nowiki>BannerHistory,Impression<nowiki>}</nowiki> to EventGate on testwiki - [[phab:T271168|T271168]] (duration: 00m 57s)
* 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2007.codfw.wmnet
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2006.codfw.wmnet
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16507 and previous config saved to /var/cache/conftool/dbconfig/20210614-140529-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16506 and previous config saved to /var/cache/conftool/dbconfig/20210614-140511-root.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16505 and previous config saved to /var/cache/conftool/dbconfig/20210614-140217-root.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16504 and previous config saved to /var/cache/conftool/dbconfig/20210614-140203-root.json
* 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2006.codfw.wmnet
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16503 and previous config saved to /var/cache/conftool/dbconfig/20210614-135456-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16502 and previous config saved to /var/cache/conftool/dbconfig/20210614-135025-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16501 and previous config saved to /var/cache/conftool/dbconfig/20210614-135007-root.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16500 and previous config saved to /var/cache/conftool/dbconfig/20210614-134713-root.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16499 and previous config saved to /var/cache/conftool/dbconfig/20210614-134700-root.json
* 13:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16498 and previous config saved to /var/cache/conftool/dbconfig/20210614-133953-root.json
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16497 and previous config saved to /var/cache/conftool/dbconfig/20210614-133801-marostegui.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16496 and previous config saved to /var/cache/conftool/dbconfig/20210614-133503-root.json
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16495 and previous config saved to /var/cache/conftool/dbconfig/20210614-133442-root.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16494 and previous config saved to /var/cache/conftool/dbconfig/20210614-133210-root.json
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16493 and previous config saved to /var/cache/conftool/dbconfig/20210614-133156-root.json
* 13:29 effie: restart memcached on codfw
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16492 and previous config saved to /var/cache/conftool/dbconfig/20210614-132449-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312 db1170:3317 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16491 and previous config saved to /var/cache/conftool/dbconfig/20210614-132235-marostegui.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16490 and previous config saved to /var/cache/conftool/dbconfig/20210614-132000-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16489 and previous config saved to /var/cache/conftool/dbconfig/20210614-131938-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16488 and previous config saved to /var/cache/conftool/dbconfig/20210614-130946-root.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16487 and previous config saved to /var/cache/conftool/dbconfig/20210614-130723-marostegui.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16486 and previous config saved to /var/cache/conftool/dbconfig/20210614-130547-root.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16485 and previous config saved to /var/cache/conftool/dbconfig/20210614-130435-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16484 and previous config saved to /var/cache/conftool/dbconfig/20210614-125442-root.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16483 and previous config saved to /var/cache/conftool/dbconfig/20210614-125043-root.json
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16482 and previous config saved to /var/cache/conftool/dbconfig/20210614-124931-root.json
* 12:37 XioNoX: configure OSPF link-protection on cr3/4-ulsfo - [[phab:T167306|T167306]]
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16481 and previous config saved to /var/cache/conftool/dbconfig/20210614-123539-root.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16480 and previous config saved to /var/cache/conftool/dbconfig/20210614-123512-marostegui.json
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16479 and previous config saved to /var/cache/conftool/dbconfig/20210614-123427-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore es1028 original weight', diff saved to https://phabricator.wikimedia.org/P16478 and previous config saved to /var/cache/conftool/dbconfig/20210614-122322-marostegui.json
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to es1028 while es1034 gets upgraded', diff saved to https://phabricator.wikimedia.org/P16477 and previous config saved to /var/cache/conftool/dbconfig/20210614-122242-marostegui.json
* 12:22 dcausse: re-pooling wdqs1012
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16476 and previous config saved to /var/cache/conftool/dbconfig/20210614-122212-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16475 and previous config saved to /var/cache/conftool/dbconfig/20210614-122036-root.json
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2005.codfw.wmnet
* 12:17 XioNoX: configure OSPF link-protection on cr3-ulsfo:xe-0/1/1 - [[phab:T167306|T167306]]
* 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2005.codfw.wmnet
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P16474 and previous config saved to /var/cache/conftool/dbconfig/20210614-121101-marostegui.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16473 and previous config saved to /var/cache/conftool/dbconfig/20210614-121031-marostegui.json
* 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2004.codfw.wmnet
* 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2004.codfw.wmnet
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16472 and previous config saved to /var/cache/conftool/dbconfig/20210614-120112-marostegui.json
* 11:28 effie: restart memcached on mc2019
* 11:09 effie: restart memcached on codfw memcached gutter pool (mc-gp2* hosts)
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2003.codfw.wmnet
* 10:52 topranks: [[phab:T283163|T283163]]: Adding "metric-out minimum-igp" to all internal/Confed BGP groups on CR routers.
* 10:46 effie: enable puppet on mc*
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2003.codfw.wmnet
* 10:39 effie: disable puppet on mc* hosts
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2001.codfw.wmnet
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2001.codfw.wmnet
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16471 and previous config saved to /var/cache/conftool/dbconfig/20210614-101839-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16469 and previous config saved to /var/cache/conftool/dbconfig/20210614-100336-root.json
* 09:56 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 (duration: 02m 37s)
* 09:54 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16467 and previous config saved to /var/cache/conftool/dbconfig/20210614-094832-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16466 and previous config saved to /var/cache/conftool/dbconfig/20210614-093329-root.json
* 09:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P16465 and previous config saved to /var/cache/conftool/dbconfig/20210614-092234-marostegui.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16464 and previous config saved to /var/cache/conftool/dbconfig/20210614-092125-root.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16463 and previous config saved to /var/cache/conftool/dbconfig/20210614-090622-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16462 and previous config saved to /var/cache/conftool/dbconfig/20210614-085118-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16461 and previous config saved to /var/cache/conftool/dbconfig/20210614-083614-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P16460 and previous config saved to /var/cache/conftool/dbconfig/20210614-081239-marostegui.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16459 and previous config saved to /var/cache/conftool/dbconfig/20210614-081031-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P16458 and previous config saved to /var/cache/conftool/dbconfig/20210614-080552-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16456 and previous config saved to /var/cache/conftool/dbconfig/20210614-075528-root.json
* 07:51 marostegui: Depool clouddb1013 to upgrade mysql
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16455 and previous config saved to /var/cache/conftool/dbconfig/20210614-074024-root.json
* 07:30 marostegui: Reboot db2148 [[phab:T284852|T284852]]
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 [[phab:T284852|T284852]]', diff saved to https://phabricator.wikimedia.org/P16454 and previous config saved to /var/cache/conftool/dbconfig/20210614-072930-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16453 and previous config saved to /var/cache/conftool/dbconfig/20210614-072520-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P16452 and previous config saved to /var/cache/conftool/dbconfig/20210614-071839-marostegui.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16451 and previous config saved to /var/cache/conftool/dbconfig/20210614-071742-root.json
* 07:15 dcausse: restart blazegraph and depool wdqs1012
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16450 and previous config saved to /var/cache/conftool/dbconfig/20210614-070238-root.json
* 07:01 moritzm: restarting mw canaries to pick up libwebp security updates
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16449 and previous config saved to /var/cache/conftool/dbconfig/20210614-064734-root.json
* 06:39 moritzm: installing libwep security updates on buster
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16448 and previous config saved to /var/cache/conftool/dbconfig/20210614-063231-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P16447 and previous config saved to /var/cache/conftool/dbconfig/20210614-062554-marostegui.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16446 and previous config saved to /var/cache/conftool/dbconfig/20210614-061226-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16445 and previous config saved to /var/cache/conftool/dbconfig/20210614-060119-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16444 and previous config saved to /var/cache/conftool/dbconfig/20210614-055723-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16443 and previous config saved to /var/cache/conftool/dbconfig/20210614-054615-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16442 and previous config saved to /var/cache/conftool/dbconfig/20210614-054219-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16441 and previous config saved to /var/cache/conftool/dbconfig/20210614-053112-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16440 and previous config saved to /var/cache/conftool/dbconfig/20210614-052715-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P16439 and previous config saved to /var/cache/conftool/dbconfig/20210614-051930-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16438 and previous config saved to /var/cache/conftool/dbconfig/20210614-051608-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P16437 and previous config saved to /var/cache/conftool/dbconfig/20210614-051522-marostegui.json
== 2021-06-12 ==
* 13:49 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused
* 13:49 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused
== 2021-06-11 ==
* 23:37 mutante: removing firewall hole for mgmt networks to install* because it turned out it cant be used for firmware upgrades
* 22:08 brennen: gitlab.wikimedia.org currently up with recommended config applied; test data deleted; users can register but not create projects. brennen, dancy, and thcipriani currently marked as admins. may need to reset data again, but hopefully not.
* 21:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
* 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
* 21:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
* 20:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
* 20:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
* 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
* 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
* 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
* 16:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
* 15:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MediaSearch/extension.json: Make MediaSearch default search experience for all users (duration: 00m 57s)
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16432 and previous config saved to /var/cache/conftool/dbconfig/20210611-150018-root.json
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16431 and previous config saved to /var/cache/conftool/dbconfig/20210611-144514-root.json
* 14:44 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 05s)
* 14:44 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
* 14:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 05s)
* 14:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
* 14:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 14:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 14:35 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 14:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:34 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:34 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 14:33 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:33 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:32 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:31 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16430 and previous config saved to /var/cache/conftool/dbconfig/20210611-143010-root.json
* 14:22 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:22 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:20 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:20 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16429 and previous config saved to /var/cache/conftool/dbconfig/20210611-141506-root.json
* 13:53 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 13:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 13:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16428 and previous config saved to /var/cache/conftool/dbconfig/20210611-135248-marostegui.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1153', diff saved to https://phabricator.wikimedia.org/P16427 and previous config saved to /var/cache/conftool/dbconfig/20210611-135036-marostegui.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1153 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16426 and previous config saved to /var/cache/conftool/dbconfig/20210611-133527-marostegui.json
* 10:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 07:29 moritzm: restarting archiva to pick up OpenJDK security updates
* 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
* 07:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
* 06:56 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:56 elukey: rm -rf empty dir /etc/apache2/sites-enabled/.links2 on webperf1001 to avoid puppet changes at every run
* 05:47 elukey: run systemctl reset-failed ifup@en5.service on doh1001 - [[phab:T273026|T273026]]
* 01:10 eileen: process-control config revision is {{Gerrit|2aed6ff89b}}
== 2021-06-10 ==
* 23:29 derick@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Citoid/modules/ve/ve.ui.CitoidInspector.js: Backport: [[gerrit:699288{{!}}CitoidInspector: rename getParameterNames to getOrderedParameterNames (T284786)]] (duration: 00m 57s)
* 21:40 urbanecm: End of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # [[phab:T282699|T282699]]
* 21:36 urbanecm: Start of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # [[phab:T282699|T282699]]
* 21:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki discussiontools # [[phab:T282699|T282699]]
* 20:13 mutante: installed tftp client on install1003 for debugging
* 20:00 jhuneidi@deploy1002: Pruned MediaWiki: 1.37.0-wmf.5 (duration: 03m 33s)
* 19:31 ryankemper: [[phab:T265547|T265547]] Cleanup following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/698025: `sudo -E cumin -b 5 'P:analytics::cluster::elasticsearch' 'sudo rm -rfv /etc/mjolnir /srv/deployment/search/mjolnir'`
* 19:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 18:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikimediaMaintenance/dumpInterwiki.php: {{Gerrit|b21904e326e917f5ac6d7129a4d224380c6e4c21}}: Remove sep11 interwiki link from dumpinterwiki.php (duration: 01m 08s)
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 23s)
* 18:39 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 03s)
* 18:38 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: {{Gerrit|8aeab139879613782548b20fc11af5e66589e30a}}: Fire language change hook ([[phab:T280770|T280770]]) (duration: 01m 07s)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|d26968c1c3b3f3e115ff37a9a138d225cabba25a}}: wgWelcomeSurveyExperimentalGroups: Use new syntax in CS.php ([[phab:T284597|T284597]]; [[phab:T284735|T284735]]) (duration: 01m 08s)
* 17:11 moritzm: updating bullseye installer image to latest daily image (kernel ABI changed again) [[phab:T275873|T275873]]
* 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:06 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:53 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 16:51 moritzm: installing rails security updates
* 16:37 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: no-op for Beta {{Gerrit|I2a42c222003}} (duration: 01m 07s)
* 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:24 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 15:09 papaul: power down ms-be2038 for BBU replacement
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16417 and previous config saved to /var/cache/conftool/dbconfig/20210610-123201-root.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16416 and previous config saved to /var/cache/conftool/dbconfig/20210610-121657-root.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16415 and previous config saved to /var/cache/conftool/dbconfig/20210610-120153-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16414 and previous config saved to /var/cache/conftool/dbconfig/20210610-114650-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16413 and previous config saved to /var/cache/conftool/dbconfig/20210610-113146-root.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16412 and previous config saved to /var/cache/conftool/dbconfig/20210610-111643-root.json
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16411 and previous config saved to /var/cache/conftool/dbconfig/20210610-110139-root.json
* 11:00 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 53s)
* 10:59 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next
* 10:47 topranks: [[phab:T283163|T283163]]: Adding "metric-out minimum-igp" to BGP group Confed_eqord on eqiad, codfw and eqdfw CRs.
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16410 and previous config saved to /var/cache/conftool/dbconfig/20210610-104635-root.json
* 10:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikiEditor/modules/jquery.wikiEditor.js: {{Gerrit|8a17c43c5470b84ba58239bb2cf947dbebf1979f}}: Fix call to renamed var ([[phab:T284716|T284716]]) (duration: 01m 25s)
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16409 and previous config saved to /var/cache/conftool/dbconfig/20210610-103132-root.json
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16408 and previous config saved to /var/cache/conftool/dbconfig/20210610-103032-marostegui.json
* 10:29 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:28 kormat: running optimize tables against pc1009 (pc3) [[phab:T282761|T282761]]
* 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:21 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16407 and previous config saved to /var/cache/conftool/dbconfig/20210610-101858-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16406 and previous config saved to /var/cache/conftool/dbconfig/20210610-100355-root.json
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16405 and previous config saved to /var/cache/conftool/dbconfig/20210610-094851-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16404 and previous config saved to /var/cache/conftool/dbconfig/20210610-093346-root.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16402 and previous config saved to /var/cache/conftool/dbconfig/20210610-093003-marostegui.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16401 and previous config saved to /var/cache/conftool/dbconfig/20210610-092246-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16399 and previous config saved to /var/cache/conftool/dbconfig/20210610-091842-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16398 and previous config saved to /var/cache/conftool/dbconfig/20210610-090345-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 30%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16397 and previous config saved to /var/cache/conftool/dbconfig/20210610-090339-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16396 and previous config saved to /var/cache/conftool/dbconfig/20210610-084841-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16395 and previous config saved to /var/cache/conftool/dbconfig/20210610-084835-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16394 and previous config saved to /var/cache/conftool/dbconfig/20210610-083338-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16393 and previous config saved to /var/cache/conftool/dbconfig/20210610-083332-root.json
* 08:25 volans: uploaded spicerack_0.0.53 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16392 and previous config saved to /var/cache/conftool/dbconfig/20210610-081834-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16391 and previous config saved to /var/cache/conftool/dbconfig/20210610-081828-root.json
* 08:17 marostegui: Drop several grants from labswiki (wikitech) [[phab:T282074|T282074]]
* 07:57 jynus: reset-failed on cumin1001 after backup rerun
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P16389 and previous config saved to /var/cache/conftool/dbconfig/20210610-075702-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16388 and previous config saved to /var/cache/conftool/dbconfig/20210610-075247-marostegui.json
* 07:44 jynus: retrying s6 snapshots on eqiad, acking demon failure
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16387 and previous config saved to /var/cache/conftool/dbconfig/20210610-073727-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16386 and previous config saved to /var/cache/conftool/dbconfig/20210610-072224-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16385 and previous config saved to /var/cache/conftool/dbconfig/20210610-070720-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16384 and previous config saved to /var/cache/conftool/dbconfig/20210610-065217-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16383 and previous config saved to /var/cache/conftool/dbconfig/20210610-064916-root.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16382 and previous config saved to /var/cache/conftool/dbconfig/20210610-063745-marostegui.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16381 and previous config saved to /var/cache/conftool/dbconfig/20210610-063412-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16380 and previous config saved to /var/cache/conftool/dbconfig/20210610-061909-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16379 and previous config saved to /var/cache/conftool/dbconfig/20210610-061806-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16378 and previous config saved to /var/cache/conftool/dbconfig/20210610-060405-root.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16377 and previous config saved to /var/cache/conftool/dbconfig/20210610-060302-root.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16376 and previous config saved to /var/cache/conftool/dbconfig/20210610-055327-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16375 and previous config saved to /var/cache/conftool/dbconfig/20210610-055037-root.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16374 and previous config saved to /var/cache/conftool/dbconfig/20210610-054802-root.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16373 and previous config saved to /var/cache/conftool/dbconfig/20210610-054759-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16372 and previous config saved to /var/cache/conftool/dbconfig/20210610-053534-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16371 and previous config saved to /var/cache/conftool/dbconfig/20210610-053259-root.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16370 and previous config saved to /var/cache/conftool/dbconfig/20210610-053255-root.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16369 and previous config saved to /var/cache/conftool/dbconfig/20210610-052421-marostegui.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16368 and previous config saved to /var/cache/conftool/dbconfig/20210610-052030-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16367 and previous config saved to /var/cache/conftool/dbconfig/20210610-052017-marostegui.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16366 and previous config saved to /var/cache/conftool/dbconfig/20210610-050526-root.json
== 2021-06-09 ==
* 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1002.wikimedia.org
* 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
* 21:59 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host doh1002.wikimedia.org
* 21:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1001.wikimedia.org
* 21:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1001.wikimedia.org
* 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/DiscussionTools/modules/dt-ve/CommentTargetWidget.less: Backport: [[gerrit:698681{{!}}Update surface styles for VE changes (T284567)]] (duration: 01m 14s)
* 21:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/includes/language/LanguageConverter.php: Backport: [[gerrit:699014{{!}}Revert "Add type hint to constructor of LanguageConverter" (T284685)]] (duration: 01m 24s)
* 21:08 mutante: rsyncing static-bugzilla HTML from miscweb1002 to deploy1002
* 21:00 mutante: deploy1002 - creating temp dir /srv/miscweb to rsync static-bugzilla data to, coming from miscweb1002 [[phab:T281538|T281538]]
* 20:36 mutante: deployed temp ferm change on deployment servers to let miscweb dump data, puppetized. scap pull from mwdebug1001 works, deployment good to go
* 19:08 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]] (duration: 01m 07s)
* 19:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 18:07 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php (foreachwiki)
* 17:52 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php --wiki rmywiki
* 17:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudmetrics1002.eqiad.wmnet
* 17:32 aborrero@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudmetrics1002.eqiad.wmnet
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 17:16 jayme: updated python3-docker-report to 0.0.12 on chartmuseum2001.codfw.wmnet,chartmuseum1001.eqiad.wmnet,deneb.codfw.wmnet,registry[2003-2008].codfw.wmnet,registry[1003-1004].eqiad.wmnet
* 16:35 jayme: import docker-report 0.0.12 into buster-wikimedia
* 15:37 hnowlan: rebuilding maps2009 as buster master
* 15:08 vgutierrez: restarting acme-chief on acmechief1001
* 15:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 15:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 15:01 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 55s)
* 15:00 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:57 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 04s)
* 14:57 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:51 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 15s)
* 14:50 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:45 moritzm: installing postgresql 9.6 security updates on stretch
* 14:37 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on all wikis - [[phab:T282562|T282562]] (duration: 01m 06s)
* 14:33 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on all wikis - [[phab:T282855|T282855]] (duration: 01m 06s)
* 14:23 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on testwiki - [[phab:T282855|T282855]] (duration: 01m 07s)
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16358 and previous config saved to /var/cache/conftool/dbconfig/20210609-141807-root.json
* 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=0; selector: name=maps2009.codfw.wmnet
* 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:59 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on testwiki - [[phab:T282562|T282562]] (duration: 01m 08s)
* 13:56 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki1001 - [[phab:T282469|T282469]]
* 13:54 XioNoX: Add Routinator 3000 0.9.0 to the APT repo - [[phab:T282469|T282469]]
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16356 and previous config saved to /var/cache/conftool/dbconfig/20210609-134800-root.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16355 and previous config saved to /var/cache/conftool/dbconfig/20210609-133257-root.json
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16354 and previous config saved to /var/cache/conftool/dbconfig/20210609-132958-marostegui.json
* 13:12 moritzm: installing nginx security updates
* 13:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 02m 26s)
* 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 00m 10s)
* 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 01m 14s)
* 13:05 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16351 and previous config saved to /var/cache/conftool/dbconfig/20210609-130114-root.json
* 12:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 12:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1 (duration: 00m 53s)
* 12:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16350 and previous config saved to /var/cache/conftool/dbconfig/20210609-124610-root.json
* 12:43 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 28s)
* 12:42 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:42 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 08s)
* 12:41 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:41 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 47s)
* 12:40 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:39 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 41s)
* 12:39 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16349 and previous config saved to /var/cache/conftool/dbconfig/20210609-123615-root.json
* 12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 12:33 godog: lists1001:rm /var/lib/prometheus/node.d/mailman_queues.prom
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16348 and previous config saved to /var/cache/conftool/dbconfig/20210609-123106-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16347 and previous config saved to /var/cache/conftool/dbconfig/20210609-122111-root.json
* 12:18 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 03m 38s)
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16345 and previous config saved to /var/cache/conftool/dbconfig/20210609-121603-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16344 and previous config saved to /var/cache/conftool/dbconfig/20210609-121501-marostegui.json
* 12:14 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:13 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 53s)
* 12:12 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 44s)
* 12:09 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 12:09 hnowlan: running `nodetool decommission` on maps2009
* 12:06 hnowlan: stopped tilerator on maps2009
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16343 and previous config saved to /var/cache/conftool/dbconfig/20210609-120608-root.json
* 12:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
* 12:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
* 12:04 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
* 12:03 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 06s)
* 12:03 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 12:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac43baa}}: {{Gerrit|d185728}}: WelcomeSurveyExperimentalGroups: Use new syntax ([[phab:T284599|T284599]]) (duration: 01m 19s)
* 11:59 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 54s)
* 11:58 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:54 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 41s)
* 11:54 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:53 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 03m 11s)
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16342 and previous config saved to /var/cache/conftool/dbconfig/20210609-115104-root.json
* 11:50 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:49 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 02m 16s)
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16341 and previous config saved to /var/cache/conftool/dbconfig/20210609-114944-marostegui.json
* 11:47 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 05s)
* 11:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:46 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 53s)
* 11:45 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: redeploy HEAD~1 (duration: 01m 55s)
* 11:38 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: redeploy HEAD~1
* 11:36 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1 (duration: 00m 54s)
* 11:35 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1
* 11:34 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 02m 23s)
* 11:32 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
* 11:32 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 00m 59s)
* 11:31 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
* 11:27 jbond: drop keep_env from sudo config - #[[phab:T275852|T275852]]
* 11:22 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 43s)
* 11:22 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 11:21 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 15s)
* 11:20 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 11:11 awight: EU deployment window complete
* 11:10 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698855{{!}}Set wgAutoConfirmCount to 10 for enwikisource (T284627)]] (duration: 02m 04s)
* 10:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
* 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 53s)
* 10:14 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 05m 41s)
* 10:07 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:06 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 38s)
* 10:06 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 [[phab:T283235|T283235]]', diff saved to https://phabricator.wikimedia.org/P16337 and previous config saved to /var/cache/conftool/dbconfig/20210609-100423-marostegui.json
* 10:00 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 48s)
* 09:59 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 09:58 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on schema* after switch towards nginx-light [[phab:T164456|T164456]]
* 07:54 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:16 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:26 XioNoX: Add 185.71.138.0/24 to network::external and diffscan - [[phab:T252132|T252132]]
* 06:12 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16334 and previous config saved to /var/cache/conftool/dbconfig/20210609-053213-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16333 and previous config saved to /var/cache/conftool/dbconfig/20210609-051710-root.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16332 and previous config saved to /var/cache/conftool/dbconfig/20210609-050206-root.json
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16331 and previous config saved to /var/cache/conftool/dbconfig/20210609-044703-root.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 to remove rev_page_id index [[phab:T163532|T163532]]', diff saved to https://phabricator.wikimedia.org/P16330 and previous config saved to /var/cache/conftool/dbconfig/20210609-044428-marostegui.json
* 04:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 03:30 eileen: civicrm revision changed from {{Gerrit|eac772e9c9}} to {{Gerrit|31d07115a0}}, config revision is {{Gerrit|931a941a5e}}
* 03:01 Amir1: mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary  ([[phab:T284444|T284444]])
* 02:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:56 Amir1: clean up of the rest of mbox files (except arbcom) ([[phab:T282303|T282303]])
* 02:55 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 02:49 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "xfer categories following reimage" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 02:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:39 ryankemper: [[phab:T280382|T280382]] Re-enabled puppet on `wdqs1010`
* 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 00:37 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698654{{!}}Enable Wikisource OCR on select Wikisources (T283898)]] (duration: 01m 31s)
* 00:00 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 00:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
== 2021-06-08 ==
* 22:36 krinkle@deploy1002: Finished deploy [integration/docroot@d4c9e08]: (no justification provided) (duration: 00m 08s)
* 22:36 krinkle@deploy1002: Started deploy [integration/docroot@d4c9e08]: (no justification provided)
* 22:21 ryankemper: [[phab:T284479|T284479]] Block put back in place. We're back to expected traffic levels. We'll need a more granular mitigation in place before we can lift this block going forward.
* 22:15 ryankemper: [[phab:T284479|T284479]] Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 19 'A:cp-text' 'run-puppet-agent -q'`
* 22:14 ryankemper: [[phab:T284479|T284479]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698850, running puppet on `cp3052.esams.wmnet`
* 22:10 ryankemper: [[phab:T284479|T284479]] Yup more than enough evidence of a strong upward spike now. Proceeding to revert
* 22:10 ryankemper: [[phab:T284479|T284479]] Already starting to see a large upward spike in requests. Doing a quick sanity check to make sure this is out of the ordinary but I'll likely be putting the block back in place shortly
* 22:09 ryankemper: [[phab:T284479|T284479]] Puppet run complete across all of `cp-text`. Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-1h&to=now over the next few minutes to see if we see a large spike in `full_text` and `entity_full_text` queries
* 22:03 ryankemper: [[phab:T284479|T284479]] Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 15 'A:cp-text' 'run-puppet-agent -q'`
* 22:01 ryankemper: [[phab:T284479|T284479]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698849, running puppet on `cp3052.esams.wmnet`
* 21:59 ryankemper: [[phab:T284479|T284479]] Prior context: We put a block on a range of Google App Engine IPs yesterday to protect Cirrussearch from a bad actor; now we're going to try lifting the block and seeing if we're still getting slammed with traffic
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
* 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
* 21:29 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1009.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 21:27 ryankemper: [[phab:T280382|T280382]] Disabled puppet on `wdqs1010` out of abundance of caution; will re-enable after wdqs1009 is reimaged and xfer back is complete
* 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:38 bblack: authdns1001: update gdnsd to 3.7.0-2~wmf1
* 20:18 bblack: authdns2001: update gdnsd to 3.7.0-2~wmf1
* 19:55 bblack: dns[1235]002: update gdnsd to 3.7.0-2~wmf1
* 19:53 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 19:46 bblack: dns[1235]001: update gdnsd to 3.7.0-2~wmf1
* 19:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 19:36 ryankemper: [[phab:T280382|T280382]] Cancelling the data-transfer run to restart it; realized that the cookbook will start up the `wdqs-updater` again so will locally hack the cookbook on `cumin1001` to prevent that
* 19:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Echo/modules/nojs/mw.echo.alert.monobook.less: Backport: [[gerrit:698848{{!}}Fix MonoBook orange banner hover styles (T284496)]] (duration: 01m 08s)
* 19:26 bblack: dns400[12]: update gdnsd to 3.7.0-3~wmf1
* 19:25 bblack: apt: update gdnsd package to gdnsd-3.7.0-2~wmf1 (fix systemd reload issues)
* 19:20 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 19:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:18 ryankemper: [[phab:T280382|T280382]] `sudo systemctl stop wdqs-updater wdqs-blazegraph` on `wdqs1010` in preparation for transfer
* 19:08 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (all caught up on lag)
* 18:47 bblack: dns4001: update gdnsd to 3.7.0-1~wmf1
* 18:43 bblack: apt: update gdnsd package to gdnsd-3.7.0-1~wmf1
* 17:49 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:36 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:25 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:10 elukey: fix dbstore1007's ip address in analytics-in4 on cr<nowiki>{</nowiki>1,2<nowiki>}</nowiki>-eqiad
* 17:06 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]] (duration: 34m 12s)
* 16:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 16:27 papaul: powerdown  moss-fe2002  for relocation
* 16:06 papaul: powerdown  ms-backup2002  for relocation
* 16:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:40 papaul: powerdown ms-be2061 for relocation
* 15:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
* 15:33 papaul: powerdown thanos-fe2003 for relocation
* 15:23 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 4/4 (pc1009) ref P16060, [[phab:T280605|T280605]], [[phab:T282761|T282761]].
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 [[phab:T282761|T282761]]
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 [[phab:T282761|T282761]]
* 15:13 papaul: powerdown cp2034 for relocation
* 15:04 papaul: powerdown cp2033 for relocation
* 14:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
* 14:43 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on testreduce1001/scandium after switch towards nginx-light  [[phab:T164456|T164456]]
* 14:08 marostegui: Restart sanitarium hosts (db2094, db2095, db1154, db1155) to pick up new filters [[phab:T284106|T284106]]
* 14:05 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc3 master [[phab:T282761|T282761]] (duration: 00m 57s)
* 14:05 kormat: setting pc1010 as pc3 primary [[phab:T282761|T282761]]
* 13:51 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 42s)
* 13:51 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 13:48 otto@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 13:41 otto@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 13:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 47s)
* 13:39 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 13:36 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 01m 03s)
* 13:35 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 13:33 otto@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
* 13:22 otto@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
* 12:15 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master [[phab:T282761|T282761]] (duration: 00m 57s)
* 12:14 kormat: setting pc1008 back as pc2 primary [[phab:T282761|T282761]]
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef49422b162ab0161bc39da857b3230175ac4492}}: enwiki: Disable indexing on the Book namespace ([[phab:T283522|T283522]]) (duration: 00m 56s)
* 11:46 urbanecm: Start server-side upload for 1 file ([[phab:T283470|T283470]])
* 11:45 moritzm: installing nginx security updates on buster
* 11:43 urbanecm: Start server-side upload for 2 files ([[phab:T283645|T283645]], [[phab:T283583|T283583]])
* 11:39 urbanecm: EU B&C deployment done
* 11:38 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16329 and previous config saved to /var/cache/conftool/dbconfig/20210608-113857-kormat.json
* 11:38 moritzm: installing ruby-nokogiri security updates
* 11:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/WikimediaEvents/: {{Gerrit|b0b46530b731d2a5f17b0aa04a4cf99df175e23d}}: universalLanguageSelector: Add missing properties ([[phab:T280770|T280770]]) (duration: 00m 56s)
* 11:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: {{Gerrit|5df13eeae3b52b98eaf3fdb99ddfa5a0f7b2b1e4}}: Pass context to compact_language_links.open hook ([[phab:T280770|T280770]]) (duration: 00m 57s)
* 11:23 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16328 and previous config saved to /var/cache/conftool/dbconfig/20210608-112354-kormat.json
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 3/3) (duration: 00m 58s)
* 11:13 urbanecm@deploy1002: Synchronized wmf-config/config/lvwiki.yaml: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 2/3) (duration: 00m 56s)
* 11:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 1/3) (duration: 00m 57s)
* 11:10 urbanecm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=lvwiki growthexperiments # [[phab:T278191|T278191]]
* 11:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16327 and previous config saved to /var/cache/conftool/dbconfig/20210608-110850-kormat.json
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abd401074247d1f1dd2722c2d4d06747b066d547}}: enwiki: Deploy Growth freatures to 2% of new accounts ([[phab:T281896|T281896]]) (duration: 00m 57s)
* 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
* 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
* 10:53 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16326 and previous config saved to /var/cache/conftool/dbconfig/20210608-105346-kormat.json
* 10:50 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 53s)
* 10:49 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
* 10:16 liw: testing upcoming Scap release on beta
* 10:01 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki2001 - [[phab:T282469|T282469]]
* 09:58 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 54s)
* 09:57 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
* 09:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:04 jayme: removing docker-images from registry: releng/ci-jessie, releng/ci-src-setup, releng/composer-php56, releng/composer-test-php56, releng/npm, releng/npm-test, releng/npm-test-3d2png, releng/npm-test-graphoid, releng/npm-test-librdkafka, releng/npm-test-maps-service, releng/php56, releng/quibble-jessie, releng/quibble-jessie-hhvm, releng/quibble-jessie-php56 - [[phab:T251918|T251918]]
* 08:31 dcausse: depooling wdqs1006 (lag)
* 08:29 dcausse: restarting blazegraph on wdqs1006
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:13 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:49 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
* 07:41 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:40 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16324 and previous config saved to /var/cache/conftool/dbconfig/20210608-072937-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16323 and previous config saved to /var/cache/conftool/dbconfig/20210608-071433-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16322 and previous config saved to /var/cache/conftool/dbconfig/20210608-065930-root.json
* 06:52 tgr: [[phab:T283606|T283606]]: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=<nowiki>{</nowiki>ar,bn,cs,vi<nowiki>}</nowiki>wiki --verbose --search-index with gerrit:696307 applied
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16321 and previous config saved to /var/cache/conftool/dbconfig/20210608-064426-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for upgrade', diff saved to https://phabricator.wikimedia.org/P16320 and previous config saved to /var/cache/conftool/dbconfig/20210608-064055-marostegui.json
* 06:27 elukey: clean some airflow logs on an-airflow1001 as one off to free space (had a chat with the Search team first)
* 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 04:54 marostegui: Repool clouddb1019:3314
* 04:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:38 ryankemper: [[phab:T284445|T284445]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "repairing overinflated blazegraph journal" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs`
* 02:37 ryankemper: [[phab:T284445|T284445]] after manually stopping blazegraph/wdqs-updater, `sudo rm -fv /srv/wdqs/wikidata.jnl` on `wdqs1012` (clearing old overinflated journal file away before xferring new one)
* 02:34 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo depool` (catching up on ~7h of lag)
== 2021-06-07 ==
* 21:26 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 21:12 sbassett: Deployed security patch for [[phab:T284364|T284364]]
* 19:30 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] We'll keep monitoring. For now this incident is resolved. Glancing at our current volume relative to what we'd expect, the numbers we see match what we'd expect. If we're accidentally banning any innocent requests they must be an incredibly small percentage of the total otherwise we'd see significantly lower volume than expected
* 19:25 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] Seeing the expected drop in `entity_full_text` requests here: https://grafana-rw.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-12h&to=now As a result we're no longer rejecting any requests
* 19:21 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] We're working on rolling out https://gerrit.wikimedia.org/r/698607, which will ban search API requests that match the Google App Engine IP range `2600:1900::0/28` AND whose user agent includes `HeadlessChrome`
* 19:19 cdanis: [[phab:T284479|T284479]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin -b16 'A:cp-text' "run-puppet-agent"
* 19:07 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve [[phab:T284462|T284462]] (duration: 04m 53s)
* 19:02 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve [[phab:T284462|T284462]]
* 19:01 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve (duration: 02m 01s)
* 18:59 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve
* 18:57 herron: prometheus3001: moved /srv back to vda1 filesystem [[phab:T243057|T243057]]
* 18:26 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php-1.37.0-wmf.7]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=skwiki --phab=[[phab:T284149|T284149]]
* 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/includes/WelcomeSurvey.php: {{Gerrit|368b5d9}}: {{Gerrit|0e79aee}}: WelcomeSurvey backports ([[phab:T284127|T284127]], [[phab:T284257|T284257]]; 2/2) (duration: 00m 57s)
* 18:22 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/extension.json: {{Gerrit|368b5d9}}: {{Gerrit|0e79aee}}: WelcomeSurvey backports ([[phab:T284127|T284127]], [[phab:T284257|T284257]]; 1/2) (duration: 00m 56s)
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|7089728}}: {{Gerrit|b2482fb}}: initWikiConfig GE backports ([[phab:T284072|T284072]]) (duration: 00m 58s)
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 3/3) (duration: 00m 56s)
* 18:14 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 2/3) (duration: 00m 56s)
* 18:14 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 18:14 ottomata: rolling restart of kafka jumbo brokers  - [[phab:T283067|T283067]]
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/config/skwiki.yaml: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 1/3) (duration: 00m 59s)
* 18:12 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 18:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=skwiki growthexperiments # [[phab:T284149|T284149]]
* 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5de2f8b27b016a2cd8f424d8e40318edde5e5704}}: Set WelcomeSurveyEnableWithHomepage ([[phab:T281896|T281896]], [[phab:T284257|T284257]]) (duration: 00m 59s)
* 17:53 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 17:53 ottomata: rolling restart of kafka jumbo mirror makers  - [[phab:T283067|T283067]]
* 17:17 ryankemper: [Cirrussearch] We're seeing ~10% of current requests being rejected by poolcounter, due to ~2x expected `eqiad.full_text` query volume and ~30x expected `eqiad.entity_full_text` query volume
* 16:56 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph locked up)
* 16:51 razzi: run homer '*.eqiad.wmnet' diff
* 16:49 ottomata: restarting mysqld analytics-meta replica on db1108 to apply config change - [[phab:T272973|T272973]]
* 16:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6 (duration: 04m 29s)
* 16:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6
* 16:09 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6 (duration: 00m 35s)
* 16:09 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6
* 14:57 moritzm: installing remaining lz4 security updates on buster
* 14:35 moritzm: installing isc-dhcp security updates
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P16315 and previous config saved to /var/cache/conftool/dbconfig/20210607-141722-marostegui.json
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P16314 and previous config saved to /var/cache/conftool/dbconfig/20210607-141307-marostegui.json
* 13:35 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3) (duration: 00m 52s)
* 13:34 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3)
* 13:34 moritzm: installing libxml2 security updates on stretch
* 13:32 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 14s)
* 13:31 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 13:28 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 54s)
* 13:27 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 12:41 moritzm: removing now obsolete Java 8 packages from gerrit* [[phab:T268225|T268225]]
* 12:36 moritzm: removing now obsolete Java 8 packages from contint* [[phab:T268225|T268225]]
* 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 12:25 moritzm: installing nginx security updates on buster
* 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --add-prefix=BROKEN --fix # [[phab:T284442|T284442]]
* 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki # [[phab:T284442|T284442]]
* 11:09 Lucas_WMDE: EU backport+config window done
* 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697824{{!}}Add 2021 namespaces for wikimania wiki (T284235)]] (duration: 00m 56s)
* 10:48 volans: reset netbox-next DB with the latest prod dump
* 10:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:698472{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 10:41 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:698472{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 10:38 godog: downgrade grafana to 7.4.2 on grafana2001 - [[phab:T282863|T282863]]
* 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 10:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 10:28 kormat: reimaging db1157 [[phab:T283131|T283131]]
* 10:24 moritzm: remove now obsolete nginx mods and dependencies on htmldumper1001 [[phab:T164456|T164456]]
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 10:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 depooling: reimage to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16311 and previous config saved to /var/cache/conftool/dbconfig/20210607-100822-kormat.json
* 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 09:43 moritzm: upgrading bullseye hosts to latest packages in testing
* 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 09:03 moritzm: installing imagemagick security updates on stretch
* 06:05 marostegui: Upgrade mysql on dbstore1003 [[phab:T283235|T283235]]
* 05:57 marostegui: Stop dbstore1004 to clone dbstore1007 [[phab:T283125|T283125]]
* 05:37 marostegui: Depool clouddb1020 (s5, s8) for upgrade
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
* 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
* 04:48 marostegui: Depool clouddb1019:3314 (long running alter table)
== 2021-06-05 ==
* 16:16 Amir1: deleting all private archives of mm2. All are inaccessible now ([[phab:T282303|T282303]])
* 15:21 Amir1: delete mbox files of group D and E in mm2 ([[phab:T282303|T282303]])
* 14:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:21 mutante: backup1001 - systemctl baclua-dir works again (restoring backup for non-existing host)
* 00:18 mutante: backup1001 systemctl reload bacula-dir  fails
== 2021-06-04 ==
* 22:08 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4001.wikimedia.org
* 21:51 cwhite@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4001.wikimedia.org
* 20:59 bblack: repool cp1087 - [[phab:T278729|T278729]]
* 20:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
* 20:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
* 19:06 bblack: depool cp1087 - [[phab:T278729|T278729]]
* 18:21 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:36 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
* 17:33 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
* 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
* 17:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 17:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 15:25 topranks: Adding 1:1 NAT configuration for fran2001 / analytics.codfw.wikimedia.org to pfw3-codfw (backup site)
* 14:47 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I434d9cfa29d84f}} (duration: 00m 56s)
* 14:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/extension.json: {{Gerrit|Iea41ab8599ffae}} (duration: 00m 56s)
* 14:44 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/includes/: {{Gerrit|Iea41ab8599ffae}} (duration: 00m 59s)
* 14:41 krinkle@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
* 13:39 Krinkle: mwmaint1002: Running purge_parsercache_now.php on pc1008, server 3/4, ref [[phab:T282761|T282761]]
* 13:33 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:46 marostegui: Upgrade mysql on clouddb1016 [[phab:T283235|T283235]]
* 12:27 marostegui: Upgrade mysql on clouddb1015 [[phab:T283235|T283235]]
* 11:20 jbond: upload debmonitor-client_0.3.0-1+deb10u3_all.deb to apt
* 10:59 topranks: Running homer for Gerrit 698162: Set up BGP peering to doh5001 in eqsin, triggering DoH /24 announcement there.
* 09:47 ema: pool cp1087 [[phab:T278729|T278729]]
* 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16304 and previous config saved to /var/cache/conftool/dbconfig/20210604-091742-root.json
* 09:06 ema: reboot cp1087 [[phab:T278729|T278729]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16303 and previous config saved to /var/cache/conftool/dbconfig/20210604-090239-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16302 and previous config saved to /var/cache/conftool/dbconfig/20210604-084735-root.json
* 08:33 marostegui: Upgrade db1110 [[phab:T283235|T283235]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16301 and previous config saved to /var/cache/conftool/dbconfig/20210604-083232-root.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16300 and previous config saved to /var/cache/conftool/dbconfig/20210604-082956-marostegui.json
* 08:20 godog: upgrade karma to 0.86-1
* 07:38 jynus: stop and upgrade db1150 [[phab:T283235|T283235]]
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16299 and previous config saved to /var/cache/conftool/dbconfig/20210604-073326-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16298 and previous config saved to /var/cache/conftool/dbconfig/20210604-073318-root.json
* 07:29 moritzm: cleanup now unused nginx mods and former deps on install* and puppetdb* servers after switch towards nginx-light (various X11 libs and libxslt) [[phab:T164456|T164456]]
* 07:24 moritzm: cleanup now unused nginx mods and former deps on install* servers after switch towards nginx-light (various X11 libs and libxslt)
* 07:19 urbanecm: Password reset for SUL User:Dominic_Mayers  ([[phab:T282656|T282656]])
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16297 and previous config saved to /var/cache/conftool/dbconfig/20210604-071823-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16296 and previous config saved to /var/cache/conftool/dbconfig/20210604-071815-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16295 and previous config saved to /var/cache/conftool/dbconfig/20210604-070319-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16294 and previous config saved to /var/cache/conftool/dbconfig/20210604-070311-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16293 and previous config saved to /var/cache/conftool/dbconfig/20210604-064815-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16292 and previous config saved to /var/cache/conftool/dbconfig/20210604-064807-root.json
* 06:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:42 marostegui: Upgrade mysql on db1096:3315 db1096:3316
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 db1096:3315', diff saved to https://phabricator.wikimedia.org/P16291 and previous config saved to /var/cache/conftool/dbconfig/20210604-064242-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16290 and previous config saved to /var/cache/conftool/dbconfig/20210604-055521-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16289 and previous config saved to /var/cache/conftool/dbconfig/20210604-054017-root.json
* 05:26 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16288 and previous config saved to /var/cache/conftool/dbconfig/20210604-052514-root.json
* 05:24 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 05:23 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 05:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:17 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 05:16 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16287 and previous config saved to /var/cache/conftool/dbconfig/20210604-051010-root.json
* 04:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
* 04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
* 04:25 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2002.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:22 ryankemper: [[phab:T280382|T280382]] `wdqs2001.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:33 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "repair overinflated wikidata jnl" --blazegraph_instance blazegraph`
* 02:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:30 ryankemper: [[phab:T280382|T280382]] `wdqs1005.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 02:25 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo pool` (caught up on lag)
* 02:09 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 02:06 ebernhardson: post-deploy restart airflow-(webserver{{!}}scheduer) on an-airflow1001
* 02:05 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift (duration: 04m 40s)
* 02:00 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift
* 01:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 00:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:08 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T280886|T280886]] (duration: 00m 57s)
* 00:07 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 00:06 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 00:05 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:05 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
== 2021-06-03 ==
* 23:41 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T280886|T280886]] (duration: 00m 56s)
* 23:40 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T280886|T280886]] (duration: 00m 57s)
* 23:33 mutante: installing OS on fresh VM doh5001
* 23:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
* 23:28 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
* 23:09 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:694686{{!}}Restrict changetags to sysops and bots on meta]] [[phab:T283625|T283625]] (duration: 00m 58s)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 22:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 22:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper: [[phab:T280382|T280382]] Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after
* 22:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:35 ryankemper: [[phab:T280382|T280382]] `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 22:28 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:54 shdubsh: restart kafka on kafka-logging to take new retention config
* 20:47 sbassett: Deployed security patch for [[phab:T282932|T282932]]
* 20:37 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader[12]001
* 20:35 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s)
* 20:34 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 20:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:34 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container
* 20:34 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 20:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 19:58 mutante: [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts
* 19:56 mutante: [mwmaint1002:~] $ sudo systemctl start  daily_account_consistency_check.service
* 19:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
* 19:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s)
* 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org
* 19:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs
* 19:33 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images - [[phab:T251918|T251918]] -  icinga-wm> RECOVERY - Check systemd state on deneb is OK
* 19:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:32 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
* 19:28 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 19:27 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 19:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org
* 19:14 mutante: install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers [[phab:T164456|T164456]]
* 19:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
* 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
* 19:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
* 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
* 18:52 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)
* 18:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter
* 18:46 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 18:46 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 18:39 ryankemper: [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)
* 18:37 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547)
* 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
* 18:28 mutante: temp. disabling puppet on install* servers. switching nginx to light variant ([[phab:T164456|T164456]])
* 18:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s)
* 18:16 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter
* 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
* 17:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
* 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
* 17:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
* 17:37 brennen: gitlab1001: re-running install-gitlab-server.sh
* 17:16 urandom: remove dropped Cassandra keyspace snapshots -- [[phab:T258414|T258414]]
* 16:55 ejegg: updated payments-wiki from {{Gerrit|6fac77f60e}} to {{Gerrit|7be0534b91}}
* 16:23 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:49 topranks: Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs.
* 15:27 papaul: pdu  replacement  complete
* 15:25 moritzm: upgrading gitlab to 13.11.5
* 15:08 papaul: disconnect ps2-d8-codfw for replacement
* 14:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:54 topranks: Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002.
* 14:23 moritzm: installing nginx security updates on buster
* 14:12 moritzm: installing postgresql-9.6 security updates
* 13:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16285 and previous config saved to /var/cache/conftool/dbconfig/20210603-130059-root.json
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16284 and previous config saved to /var/cache/conftool/dbconfig/20210603-124556-root.json
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16283 and previous config saved to /var/cache/conftool/dbconfig/20210603-123243-root.json
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16282 and previous config saved to /var/cache/conftool/dbconfig/20210603-123052-root.json
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16281 and previous config saved to /var/cache/conftool/dbconfig/20210603-121739-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16280 and previous config saved to /var/cache/conftool/dbconfig/20210603-121548-root.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16279 and previous config saved to /var/cache/conftool/dbconfig/20210603-121205-marostegui.json
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16278 and previous config saved to /var/cache/conftool/dbconfig/20210603-121133-root.json
* 12:06 moritzm: restarting FPM on mw canaries to pick up lz4 update
* 12:03 moritzm: installing lz4 security updates on buster
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16277 and previous config saved to /var/cache/conftool/dbconfig/20210603-120235-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16276 and previous config saved to /var/cache/conftool/dbconfig/20210603-115628-root.json
* 11:53 moritzm: installing curl security updates on stretch
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16275 and previous config saved to /var/cache/conftool/dbconfig/20210603-114731-root.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16274 and previous config saved to /var/cache/conftool/dbconfig/20210603-114503-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16273 and previous config saved to /var/cache/conftool/dbconfig/20210603-114325-marostegui.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16272 and previous config saved to /var/cache/conftool/dbconfig/20210603-114124-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16271 and previous config saved to /var/cache/conftool/dbconfig/20210603-113000-root.json
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16270 and previous config saved to /var/cache/conftool/dbconfig/20210603-112620-root.json
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16269 and previous config saved to /var/cache/conftool/dbconfig/20210603-112243-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16268 and previous config saved to /var/cache/conftool/dbconfig/20210603-111456-root.json
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e84096857c8a2f753e077aa6c3e37b910b9e1fcd}}: jawiki: extended confirmed should be 120 days since first edit, not registration ([[phab:T284212|T284212]]) (duration: 00m 58s)
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16267 and previous config saved to /var/cache/conftool/dbconfig/20210603-110906-root.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16266 and previous config saved to /var/cache/conftool/dbconfig/20210603-105953-root.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16265 and previous config saved to /var/cache/conftool/dbconfig/20210603-105536-marostegui.json
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16264 and previous config saved to /var/cache/conftool/dbconfig/20210603-105402-root.json
* 10:52 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:41 godog: test librenms/AM paging
* 10:40 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16263 and previous config saved to /var/cache/conftool/dbconfig/20210603-103858-root.json
* 10:28 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16262 and previous config saved to /var/cache/conftool/dbconfig/20210603-102354-root.json
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16261 and previous config saved to /var/cache/conftool/dbconfig/20210603-101950-marostegui.json
* 10:13 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc2 primary [[phab:T282761|T282761]] (duration: 00m 58s)
* 09:38 marostegui: Deploy schema change on s3 codfw master (with replication) - [[phab:T282373|T282373]] [[phab:T282372|T282372]] [[phab:T282371|T282371]]
* 09:37 moritzm: upgrading eqiad to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) [[phab:T235162|T235162]]
* 08:55 moritzm: uploading gitlab-ce 13.11.5-ce to apt.wikimedia.org thirdparty/gitlab
* 08:43 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:37 moritzm: upgrading codfw to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) [[phab:T235162|T235162]]
* 08:23 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:09 moritzm: upgrading esams/eqsin to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range)
* 07:52 ryankemper: [WDQS] Pooled `wdqs1008` and `wdqs2006` (all caught up on lag)
* 07:48 moritzm: uploaded debmonitor-client 0.3.0-1+deb10u2 to apt.wikimedia.org
* 06:24 ryankemper: [WDQS] De-pooled `wdqs1008` and `wdqs2006` (~1 hour of lag to catch up on)
* 06:23 ryankemper: [[phab:T280382|T280382]] `wdqs2006.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 06:23 ryankemper: [[phab:T280382|T280382]] `wdqs1008.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:20 marostegui: Deploy schema change on db1121, lag will appear on s4 (commonswiki) wiki replicas - [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P16259 and previous config saved to /var/cache/conftool/dbconfig/20210603-051853-marostegui.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16258 and previous config saved to /var/cache/conftool/dbconfig/20210603-051402-root.json
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16257 and previous config saved to /var/cache/conftool/dbconfig/20210603-045859-root.json
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16256 and previous config saved to /var/cache/conftool/dbconfig/20210603-044355-root.json
* 04:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:36 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 04:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:30 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 04:29 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 04:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16255 and previous config saved to /var/cache/conftool/dbconfig/20210603-042851-root.json
* 02:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
* 02:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
* 02:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
* 02:07 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1008.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 02:07 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
* 02:05 ryankemper: [[phab:T280382|T280382]] `wdqs1003.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 02:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:51 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2006.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 01:47 ryankemper: [[phab:T280382|T280382]] `wdqs2003.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 01:43 ryankemper: [WDQS] Pooled `wdqs1004` (caught up on lag)
* 01:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Gadgets: Backport: [[gerrit:697816{{!}}Reduce message parse in GadgetHooks::getPreferences (second time) (T58633 T278650)]], Try II (duration: 00m 57s)
* 00:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/user/UserOptionsManager.php: Backport: [[gerrit:697818{{!}}user: Accept options-messages for multiselect user options (T58633 T278650)]] (duration: 00m 57s)
* 00:35 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
== 2021-06-02 ==
* 23:57 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 23:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 23:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 23:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:47 ryankemper: [[phab:T280382|T280382]] `wdqs1004.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 23:41 ladsgroup@deploy1002: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:28 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 23:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 23:26 ryankemper: [[phab:T280382|T280382]] `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid10`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 23:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:18 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes: Backport: [[gerrit:697817{{!}}Allow html form field option 'options-messages' to get parsed (T58633)]] (duration: 01m 01s)
* 22:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 22:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 22:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697855{{!}}Enable wgVectorConsolidateUserLinks on the beta cluster (T266536)]] (duration: 00m 57s)
* 22:39 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage_2`
* 22:34 ryankemper: [[phab:T280382|T280382]] Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'P<nowiki>{</nowiki>apt*<nowiki>}</nowiki>' 'sudo rm -rfv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
* 22:30 ryankemper: [[phab:T280382|T280382]] Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'P<nowiki>{</nowiki>install*<nowiki>}</nowiki>' 'sudo rm -fv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
* 22:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
* 22:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
* 22:19 Amir1: setting charset of all tables in wikitech to binary ([[phab:T284108|T284108]] [[phab:T269348|T269348]])
* 22:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage_2`
* 22:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
* 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2007.codfw.wmnet
* 22:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:59 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 21:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
* 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3002.wikimedia.org
* 21:37 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
* 21:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 21:30 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 21:28 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
* 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
* 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs2007.codfw.wmnet
* 21:17 ryankemper: `ryankemper@wdqs1013:~$ sudo depool`  (catching up on 17.9h lag)
* 21:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
* 21:10 ryankemper: [[phab:T280382|T280382]] [[phab:T281437|T281437]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2007.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 21:10 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3001.wikimedia.org
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts doh3001.wikimedia.org
* 20:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3002.wikimedia.org
* 20:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
* 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
* 19:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e9c981d5173b1d611458f6c70b34d73476b7bbde}}: Revert "enwiktionary: Raise AF emergency disable treshold+count" ([[phab:T283460|T283460]]) (duration: 00m 58s)
* 18:11 urbanecm: Deployed security patch for [[phab:T281972|T281972]]
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4bf76fc09bc06f76ce842d42b77fe6b036943b69}}: Make DiscussionTools replytool available for everyone on wikitech ([[phab:T283119|T283119]]) (duration: 00m 58s)
* 17:33 legoktm: disabled Kadirselcuk gerrit account, +1 spam (and blocked elsewhere)
* 16:55 legoktm: restarted apache2 on lists1001 for https://gerrit.wikimedia.org/r/697805
* 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:19 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cescout1001.eqiad.wmnet
* 16:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts cescout1001.eqiad.wmnet
* 13:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 12:05 jbond: enable puppet fleet wide.  post changing puppetdb to use nginx-light #[[phab:T164456|T164456]]
* 11:54 jbond: disable puppet fleet wide.  changing puppetdb to use nginx-light #[[phab:T164456|T164456]]
* 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/includes/actions/InfoAction.php: {{Gerrit|85feaa15d9bbda130541adb6302f31c4372e6519}}: InfoAction: Cast wgNamespaceProtection to array ([[phab:T283751|T283751]]) (duration: 01m 00s)
* 11:08 jbond: update mod_auth_cas [[phab:T264605|T264605]]
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f12e368481b6836eefa070ad5dcf52af3f39d479}}: Investigate MediaSearch usability on other wikis ([[phab:T278984|T278984]]) (duration: 00m 57s)
* 11:04 jbond: upload libapache2-mod-auth-cas_1.2-1 for buster and stretch - #[[phab:T264605|T264605]]
* 11:01 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf11u1_amd64.deb - #[[phab:T264605|T264605]]
* 10:44 topranks: Commit pfw policy {{Gerrit|1622570851}} to pfw3-codfw and pfw3-eqiad to support new host fran2001 ([[phab:T282056|T282056]])
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:17 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 10:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbstore1006.eqiad.wmnet
* 09:51 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1006.eqiad.wmnet
* 09:14 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280396]] ([[:phab:T284118{{!}}request]])' 'OTRS' 'VRT' 'Quiddity (WMF)' # [[phab:T284118|T284118]]
* 08:12 moritzm: removed eight inactive addresses from ops@ list
* 07:44 moritzm: installing squid security updates
* 06:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
* 06:51 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
* 06:38 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:34 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16249 and previous config saved to /var/cache/conftool/dbconfig/20210602-050234-root.json [REPLAY FROM 2021-06-02 05:02:34]
* 05:36 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071', diff saved to https://phabricator.wikimedia.org/P16248 and previous config saved to /var/cache/conftool/dbconfig/20210602-045736-marostegui.json [REPLAY FROM 2021-06-02 04:57:36]
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2071', diff saved to https://phabricator.wikimedia.org/P16247 and previous config saved to /var/cache/conftool/dbconfig/20210602-045717-marostegui.json [REPLAY FROM 2021-06-02 04:57:17]
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16246 and previous config saved to /var/cache/conftool/dbconfig/20210602-044730-root.json [REPLAY FROM 2021-06-02 04:47:31]
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16245 and previous config saved to /var/cache/conftool/dbconfig/20210602-043227-root.json [REPLAY FROM 2021-06-02 04:32:27]
* 05:32 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 05:31 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697671{{!}}Fix pageterms API call for Special:Nearby in Wikidata (T281639)]] (duration: 00m 56s) [REPLAY FROM 2021-06-01 21:44:06]
* 05:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [REPLAY FROM 2021-06-01 19:42:38]
* 05:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox [REPLAY FROM 2021-06-01 19:29:26]
* 05:28 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16251 and previous config saved to /var/cache/conftool/dbconfig/20210602-051919-marostegui.json
* 05:18 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16250 and previous config saved to /var/cache/conftool/dbconfig/20210602-051738-root.json
* off: restart tcpircbot-logmsgbot on alert1001 - [[phab:T284123|T284123]]
* 04:56 marostegui: Test
== 2021-06-01 ==
* 21:09 andrewbogott: dropping a bunch of tables from the labswiki db as per [[phab:T284108|T284108]]
* 17:23 Amir1: starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists ([[phab:T282303|T282303]])
* 16:31 moritzm: updating debmonitor clients to 0.3.0 (along with cleanup of sysuser UID allocation)
* 15:38 legoktm: stopped mailman2 service on lists1001 ([[phab:T52864|T52864]])
* 15:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 15:16 ryankemper: [[phab:T283223|T283223]] `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id [[phab:T283223|T283223]]` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
* 15:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 14:59 topranks: Restoring Lumen CCT {{Gerrit|442550293}} to normal metric / bring back into service ([[phab:T274234|T274234]])
* 13:56 marostegui: Stop mysql on db2079 (codfw master) -  [[phab:T283743|T283743]]
* 13:53 topranks: Draining Lumen CCT {{Gerrit|442550293}} to do some comparative bandwidth tests from eqiad to codfw ([[phab:T274234|T274234]])
* 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f757748a14ac8c205f6a5fac0611216c01ceb1c}}: cawiki: Fix help panel links ([[phab:T280673|T280673]]) (duration: 00m 58s)
* 13:48 otto@deploy1002: Finished deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - [[phab:T272973|T272973]] (duration: 02m 58s)
* 13:45 otto@deploy1002: Started deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - [[phab:T272973|T272973]]
* 13:43 topranks: Restoring Telia CT IC-307235 to normal metric / bring back into service ([[phab:T274234|T274234]])
* 13:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
* 13:06 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
* 12:12 dcausse: re-pooling wdsq1005 (caught-up lag)
* 12:06 moritzm: installing djvulibre security updates
* 11:16 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 11:14 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e4989d2b19e07d2a816cd7f6afae077f86aca54e}}: Enable "Diff" RSS feed on meta ([[phab:T283380|T283380]]) (duration: 00m 58s)
* 11:04 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
* 10:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:37 topranks: Draining Telia CT IC-307235 to do some comparative bandwidth tests from eqiad to codfw ([[phab:T274234|T274234]])
* 08:04 hashar: Restarted Gerrit on gerrit1001 for Java 11 upgrade # [[phab:T268225|T268225]]
* 08:02 hashar: Restarted Gerrit on gerrit2001 for Java 11 upgrade # [[phab:T268225|T268225]]
* 07:26 dcausse: depooling wdsq1005 (lag)
* 07:14 moritzm: installing nginx security updates
* 05:56 legoktm: restarting mailman3 on lists1001
* 05:37 legoktm: uploaded django-allauth_0.44.0+ds-1~bpo10+1 mailman3_3.3.3-1~bpo10+4 to apt.wm.o
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16242 and previous config saved to /var/cache/conftool/dbconfig/20210601-053137-marostegui.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16241 and previous config saved to /var/cache/conftool/dbconfig/20210601-052349-root.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16240 and previous config saved to /var/cache/conftool/dbconfig/20210601-050845-root.json
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16239 and previous config saved to /var/cache/conftool/dbconfig/20210601-045341-root.json
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16238 and previous config saved to /var/cache/conftool/dbconfig/20210601-043837-root.json
* 00:46 legoktm@deploy1002: Synchronized logos/config.yaml: Revert "Use eswiki 20th anniversary logos" ([[phab:T280908|T280908]]) (duration: 01m 07s)
* 00:43 legoktm@deploy1002: Synchronized wmf-config/logos.php: Revert "Use eswiki 20th anniversary logos" ([[phab:T280908|T280908]]) (duration: 01m 00s)
== 2021-05-31 ==
* 07:32 legoktm: deleted all outoing list mail that is for a gmail address being unsubscribed [[phab:T284003|T284003]]
* 07:30 legoktm: deleted all outoing list mail that is for a yahoo/aol address being unsubscribed [[phab:T284003|T284003]]
* 07:23 legoktm: deleting all outgoing list mail that has a subject that starts with "You have been unsubscribed from the" [[phab:T284003|T284003]]
* 06:33 legoktm: manually unsubscribed ahalfaker [at] wikimedia.org from scoring-internal list, triggering mailman bounce loop [[phab:T282348|T282348]]#7124014
* 06:22 legoktm: sudo systemctl restart mailman3 on lists1001, bounce runner crashed
== 2021-05-29 ==
* 14:44 elukey: execute apt-get clean on an-airflow1001 to free space
* 14:40 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1087.eqiad.wmnet
== 2021-05-28 ==
* 08:06 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: name=wdqs1003.eqiad.wmnet,dc=eqiad
* 08:02 elukey: restart blazegraph on wdqs1011
* 01:43 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:696736{{!}}ExtensionDistributor: REL1_36 is now the stable release (T279455)]] (duration: 00m 57s)
== 2021-05-27 ==
* 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
* 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
* 23:45 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:696713{{!}}Revert "README: deployment training"]] (duration: 00m 55s)
* 23:38 derick@deploy1002: Synchronized README: Config: [[gerrit:696706{{!}}README: deployment training]] (duration: 00m 55s)
* 23:21 egardner@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:693951{{!}}Enable MediaSearch Assessment filter (T276257)]] (duration: 00m 57s)
* 22:06 urbanecm: Invalidate bot password for `PKM@PKMbot` ([[phab:T283839|T283839]])
* 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda #[[phab:T279545|T279545]]
* 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda
* 19:53 James_F: Manually create missing SecurePoll DB tables on mnwwiktionary, taywiki, and trvwiki for [[phab:T283844|T283844]]
* 19:48 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 19:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.7
* 19:15 tgr: US morning deploys done
* 19:12 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:695364{{!}}GrowthExperiments: Enable Add Links for 50% of new users and all old ones (T277356)]] (duration: 01m 04s)
* 19:03 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments: Backport: [[gerrit:695833{{!}}Help panel: SwitchEditorPanel fixes (T282800)]] [[gerrit:695841{{!}}Avoid session loading when loading task types in help panel RL data (T282800)]] [[gerrit:696530{{!}}Add Link: Fix homepage PV token and newcomer task token logging (T283765)]] (duration: 01m 05s)
* 18:57 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:56 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:693208{{!}}ptwiki: Add 'flow-delete' to 'eliminator' user group (T283266)]] (duration: 01m 04s)
* 18:49 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments: Backport: [[gerrit:695834{{!}}Help panel: SwitchEditorPanel fixes (T282800)]] [[gerrit:695842{{!}}Avoid session loading when loading task types in help panel RL data (T282800)]] [[gerrit:696527{{!}}Add Link: Fix homepage PV token and newcomer task token logging (T283765)]] (duration: 01m 06s)
* 18:22 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:09 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:696390{{!}}Enable Growth's community configuration on the pilot wikis (T283809)]] (duration: 01m 06s)
* 17:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:20 James_F: Running SecurePoll maintenance script cli/updateNotBlockedKey.php for all wikis [[phab:T277079|T277079]]
* 17:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:59 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:58 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs1006's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_disk`
* 15:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs2004's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_disk`
* 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 15:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:50 ryankemper: [[phab:T280382|T280382]] (fixing couple wrong host names in last log line) `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2001`, `wdqs2002`, and `wdqs2008`, have a 975G `wikidata.jnl`
* 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:44 ryankemper: [[phab:T280382|T280382]] `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2004` and `wdqs2001` have a 975G `wikidata.jnl`. It's not clear why there's such a big divergence
* 15:41 ryankemper: [[phab:T280382|T280382]] `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`
* 15:12 XioNoX: test netconf over ssh on cr3-ulsfo
* 15:03 effie: disable puppet mc2019
* 14:14 moritzm: bounce keyholder-agent on cumin2001 to drop homer key (now on 2002 only)
* 12:57 tgr: [[phab:T283606|T283606]]: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=<nowiki>{</nowiki>ar,bn,cs,vi<nowiki>}</nowiki>wiki --verbose --search-index with gerrit:696307 applied
* 12:55 tgr: [[phab:T283606|T283606]]: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=<nowiki>{</nowiki>ar,bn,cs,vi<nowiki>}</nowiki>wiki --verbose --search-index
* 12:50 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master [[phab:T282761|T282761]] (duration: 01m 04s)
* 12:47 tgr: EU deploys done
* 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: Backport: [[gerrit:695437{{!}}Add Link: Prevent double-opening of the post-edit dialog (T283120)]] [[gerrit:695479{{!}}Always delete from search index in AddLinkSubmissionHandler (T283606)]] (duration: 01m 06s)
* 12:40 topranks: cr2-eqord: Gerrit 696383: Removing IPv4 Anycast ranges from bgp_out policy.
* 12:39 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: Backport: [[gerrit:695436{{!}}Add Link: Prevent double-opening of the post-edit dialog (T283120)]] [[gerrit:695437{{!}}Add Link: Prevent double-opening of the post-edit dialog (T283120)]] (duration: 01m 06s)
* 12:25 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:695831{{!}}Don't update backButton visibility if not set (T283511)]] (duration: 01m 06s)
* 11:51 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:695832{{!}}Don't update backButton visibility if not set (T283511)]] (duration: 01m 06s)
* 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 [[phab:T283793|T283793]]
* 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 [[phab:T283793|T283793]]
* 10:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
* 10:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
* 09:46 kormat: restarting mariadb on pc1007 to upgrade it
* 08:35 topranks: removing stale peers (AS8674 / Netnod and AS57695 / Misaka) from cr2-esams
* 08:30 moritzm: installing libx11 security updates
* 07:45 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in AMS
* 07:44 legoktm: adding stephane at kiwix as owner of offline-l per email
* 07:43 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in eqsin
* 07:42 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr2-eqord
* 07:20 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to announce Wikidough Anycast range from cr's in ulsfo
* 07:14 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr1-eqdfw
* 07:11 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr2-codfw
* 06:47 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
* 06:43 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 13s)
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16227 and previous config saved to /var/cache/conftool/dbconfig/20210527-060953-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P16226 and previous config saved to /var/cache/conftool/dbconfig/20210527-055507-marostegui.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16225 and previous config saved to /var/cache/conftool/dbconfig/20210527-055450-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16224 and previous config saved to /var/cache/conftool/dbconfig/20210527-053946-root.json
* 05:29 ryankemper: `ryankemper@cloudelastic1003:~$ sudo run-puppet-agent --force`
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16223 and previous config saved to /var/cache/conftool/dbconfig/20210527-052442-root.json
== 2021-05-26 ==
* 23:07 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: [[gerrit:695325{{!}}resourceloader: Avoid primary connection in SqlModuleDependencyStore (2)]] (duration: 01m 06s)
* 23:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: [[gerrit:695324{{!}}resourceloader: Avoid primary connection in SqlModuleDependencyStore (2)]] (duration: 01m 06s)
* 22:17 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: [[gerrit:695321{{!}}resourceloader: Avoid opening a connection to master when not needed]] (duration: 01m 06s)
* 22:10 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: [[gerrit:695320{{!}}resourceloader: Avoid opening a connection to master when not needed]] (duration: 01m 07s)
* 21:22 tgr: [[phab:T283606|T283606]]: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=<nowiki>{</nowiki>ar,bn,cs,vi<nowiki>}</nowiki>wiki --verbose --search-index
* 19:58 twentyafterfour: finished deploying wmf.7 and error levels appear unchanged. refs [[phab:T281148|T281148]]
* 19:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
* 19:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
* 19:51 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.7 (duration: 01m 07s)
* 19:50 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train (duration: 05m 12s)
* 19:50 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.7
* 19:45 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train
* 19:44 twentyafterfour: train is unblocked, proceeding to deploy wmf.7 to group1 wikis refs [[phab:T281148|T281148]]
* 19:44 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
* 19:44 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN
* 19:43 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3 (duration: 01m 00s)
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3
* 19:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: {{Gerrit|9f3410b1fc5535b34d49e287846c0b3c08882bc5}}: Add Link: Suppress the blue dot on the edit button ([[phab:T283094|T283094]]) (duration: 01m 07s)
* 19:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: {{Gerrit|512d72e8df4ce0325778035d0bc6107e6e5dedf0}}: Add Link: Suppress the blue dot on the edit button ([[phab:T283094|T283094]]) (duration: 01m 07s)
* 19:25 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|80abdf9}}: {{Gerrit|92d2952}}: Enable VisualEditor by default at ptwikinews and plwikinews ([[phab:T282846|T282846]], [[phab:T283033|T283033]]) (duration: 01m 09s)
* 19:21 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 2
* 19:17 legoktm: legoktm@deploy1002:~$ sudo -E kubectl delete pod kask-production-6d6869b697-m2qjs -n sessionstore
* 19:16 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train (duration: 01m 23s)
* 19:15 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f66b3b}}: Enable wgCiteResponsiveReferences on svwiki ([[phab:T281622|T281622]]) (duration: 01m 06s)
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|07b804b48d65057a66461808f2647fee9aca12b7}}: Enable DiscussionTools on wikitech ([[phab:T283119|T283119]]) (duration: 01m 05s)
* 17:51 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 17:39 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 17:16 legoktm@deploy1002: Synchronized private/PrivateSettings.php: Set $wgShellboxSecretKey - [[phab:T281423|T281423]] (duration: 01m 14s)
* 17:02 moritzm: restarting FPM on mw canaries to pick up libx11 update
* 16:51 moritzm: installing libx11 security updates
* 16:38 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr2-codfw - Wikidough Anycast
* 16:12 marostegui: Reboot db2107 (codfw master) [[phab:T282072|T282072]]
* 16:10 marostegui: Reboot db2103 (codfw master) [[phab:T282072|T282072]]
* 16:09 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: [[phab:T283503|T283503]]
* 16:09 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: [[phab:T283503|T283503]]
* 16:01 papaul: powerdown ms-be2038 for BBU replacement
* 15:41 effie: enable puppet on mc2019
* 15:31 marostegui: Cold reset db2107 idrac [[phab:T283727|T283727]]
* 15:23 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: [[phab:T283503|T283503]]
* 15:23 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: [[phab:T283503|T283503]]
* 15:22 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr1-codfw - Wikidough Anycast
* 15:18 urbanecm: otrs_wikiwiki was moved to vrt-wiki.wikimedia.org ([[phab:T280400|T280400]])
* 15:12 topranks: Merging https://gerrit.wikimedia.org/r/c/operations/homer/public/+/694305/ - Add Wikidough Anycast range to network config
* 15:11 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|490435edb4ea4cc10ba435125ba547231fc7f1e7}}: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org ([[phab:T280400|T280400]]) (duration: 01m 07s)
* 15:08 urbanecm@deploy1002: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|945ee9c5e88166984bf12e4039d692fe06498e40}}: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org ([[phab:T280400|T280400]]; 1/2) (duration: 01m 06s)
* 15:02 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 18s)
* 14:59 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST (duration: 05m 24s)
* 14:53 otto@deploy1002: Started deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST
* 14:50 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
* 14:49 otto@deploy1002: Started deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN
* 14:49 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 14:49 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 30m 22s)
* 14:47 volans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 14:31 moritzm: updated bullseye d-i image to 2021-05-26 daily image [[phab:T275873|T275873]]
* 14:19 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd]
* 14:18 otto@deploy1002: deploy aborted: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 00m 06s)
* 14:18 otto@deploy1002: Started deploy [analytics/refinery@e536abd]: Regular analytics weekly train [analytics/refinery@e536abd]
* 14:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 14s)
* 14:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
* 14:03 hashar@deploy1002: Finished deploy [integration/docroot@ebee5d3]: composer/npm updates (duration: 00m 09s)
* 14:03 hashar@deploy1002: Started deploy [integration/docroot@ebee5d3]: composer/npm updates
* 11:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: {{Gerrit|b3c2941}}: Allow running fixLinkRecommendationData --search-index in production ([[phab:T283606|T283606]]) (duration: 01m 07s)
* 11:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: {{Gerrit|86bba48}}: Allow running fixLinkRecommendationData --search-index in production ([[phab:T283606|T283606]]) (duration: 01m 06s)
* 11:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: GrowthExperiments backports ([[phab:T283544|T283544]]; [[phab:T282899|T282899]]; [[phab:T282546|T282546]]) (duration: 01m 06s)
* 11:26 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: GrowthExperiments backports ([[phab:T283544|T283544]]; [[phab:T282899|T282899]]; [[phab:T282546|T282546]]) (duration: 01m 19s)
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:694339{{!}}Test Wikidata: Enable empty list to object serialization (T241422)]] (duration: 01m 19s)
* 10:26 moritzm: installing lz4 security updates on buster
* 10:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 10:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/Wikibase: Backport: [[gerrit:695037{{!}}Wrap list of acceptable site ids with an APCu cache in API]] (duration: 01m 18s)
* 09:45 godog: rm /root/prometheus from prometheus5001 - old transition files
* 09:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Wikibase: Backport: [[gerrit:695035{{!}}Wrap list of acceptable site ids with an APCu cache in API]] (duration: 02m 12s)
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16222 and previous config saved to /var/cache/conftool/dbconfig/20210526-093647-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16221 and previous config saved to /var/cache/conftool/dbconfig/20210526-092144-root.json
* 09:13 elukey: deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/695192 on <nowiki>{</nowiki>cr1{{!}}cr2<nowiki>}</nowiki>-eqiad - [[phab:T225005|T225005]]
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16220 and previous config saved to /var/cache/conftool/dbconfig/20210526-090640-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16219 and previous config saved to /var/cache/conftool/dbconfig/20210526-085137-root.json
* 08:12 _joe_: purging images on deneb
* 08:11 kormat: running 'optimize table' over parsercache db on pc1007 with replication enabled [[phab:T282761|T282761]]
* 07:14 ryankemper: Pooled `wdqs1013` (caught up on lag), de-pooled `wdqs2003` (should not have been pooled due to    reimage failure)
* 07:13 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs2003.codfw.wmnet
* 05:46 marostegui: Stop MySQL on clouddb1021 to  upgrade mysql
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16215 and previous config saved to /var/cache/conftool/dbconfig/20210526-051935-root.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P16214 and previous config saved to /var/cache/conftool/dbconfig/20210526-050919-marostegui.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16213 and previous config saved to /var/cache/conftool/dbconfig/20210526-050431-root.json
* 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16212 and previous config saved to /var/cache/conftool/dbconfig/20210526-044928-root.json
* 04:35 marostegui: Deploy schema change on db1106, this will generate lag on s1 (enwiki) on wiki replicas [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16211 and previous config saved to /var/cache/conftool/dbconfig/20210526-043439-marostegui.json
* 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16210 and previous config saved to /var/cache/conftool/dbconfig/20210526-043424-root.json
* 03:29 eileen: process-control config revision is {{Gerrit|7b646533da}}
* 00:47 eileen: civicrm revision changed from {{Gerrit|584b96452a}} to {{Gerrit|eac772e9c9}}, config revision is {{Gerrit|2ca92c3c3c}}
* 00:27 mutante: phab2001 - restarted apache2
== 2021-05-25 ==
* 23:09 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 22:39 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 22:04 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 22:04 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 21:58 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 21:58 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 21:13 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 21:13 razzi@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
* 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 20:40 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 20:28 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.7
* 19:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:12 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.7 (duration: 33m 29s)
* 19:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:38 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.7
* 18:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I2ebe9674fb109f}} (duration: 00m 56s)
* 17:34 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 2/4 (pc1007, depooled spare). Ref P16060, [[phab:T280605|T280605]], [[phab:T282761|T282761]].
* 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16207 and previous config saved to /var/cache/conftool/dbconfig/20210525-173031-root.json
* 17:22 effie: disable puppet on mc2019 (for tests)
* 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16206 and previous config saved to /var/cache/conftool/dbconfig/20210525-171527-root.json
* 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16205 and previous config saved to /var/cache/conftool/dbconfig/20210525-170024-root.json
* 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16203 and previous config saved to /var/cache/conftool/dbconfig/20210525-164520-root.json
* 12:55 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|63ad5fda}}: Revert "Add svwiki 20th anniversary logos" ([[phab:T282389|T282389]]) (duration: 00m 56s)
* 12:52 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|94ede526}}: Revert "Use svwiki 20th anniversary logos" ([[phab:T282389|T282389]]) (duration: 00m 56s)
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P16200 and previous config saved to /var/cache/conftool/dbconfig/20210525-122127-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'remove db1124 from dbctl', diff saved to https://phabricator.wikimedia.org/P16199 and previous config saved to /var/cache/conftool/dbconfig/20210525-120718-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124 will be moved to the test cluster', diff saved to https://phabricator.wikimedia.org/P16198 and previous config saved to /var/cache/conftool/dbconfig/20210525-113521-marostegui.json
* 11:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 11:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 11:21 Lucas_WMDE: EU backport&config window done
* 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679327{{!}}Change HTTP to HTTPS for concept URIs on Commons (T258590)]] (duration: 00m 56s)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16196 and previous config saved to /var/cache/conftool/dbconfig/20210525-111719-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16195 and previous config saved to /var/cache/conftool/dbconfig/20210525-110215-root.json
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16194 and previous config saved to /var/cache/conftool/dbconfig/20210525-104711-root.json
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16193 and previous config saved to /var/cache/conftool/dbconfig/20210525-103208-root.json
* 09:58 ema: cp3054: upgrade varnish to latest LTS (6.0.7-1wm1) [[phab:T264398|T264398]]
* 09:28 jynus: updating puppet facts on cloud from puppetmaster1001
* 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 09:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 09:01 kormat: stopping replication on pc1010 [[phab:T282761|T282761]]
* 09:00 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc1 primary [[phab:T282761|T282761]] (duration: 00m 58s)
* 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:52 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:20 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
* 08:18 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
* 08:17 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
* 08:16 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
* 08:16 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
* 08:14 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16192 and previous config saved to /var/cache/conftool/dbconfig/20210525-080234-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P16191 and previous config saved to /var/cache/conftool/dbconfig/20210525-074950-marostegui.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16190 and previous config saved to /var/cache/conftool/dbconfig/20210525-074730-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16189 and previous config saved to /var/cache/conftool/dbconfig/20210525-073227-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16188 and previous config saved to /var/cache/conftool/dbconfig/20210525-071723-root.json
* 06:16 kart_: Updated cxserver to 2021-05-15-034540-production ([[phab:T276214|T276214]])
* 06:05 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:58 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:53 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 05:14 marostegui: Reload daily_account_consistency_check.service on mwmaint1002
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16187 and previous config saved to /var/cache/conftool/dbconfig/20210525-050921-root.json
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16186 and previous config saved to /var/cache/conftool/dbconfig/20210525-045417-root.json
* 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16185 and previous config saved to /var/cache/conftool/dbconfig/20210525-043914-root.json
* 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184', diff saved to https://phabricator.wikimedia.org/P16184 and previous config saved to /var/cache/conftool/dbconfig/20210525-043234-marostegui.json
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P16183 and previous config saved to /var/cache/conftool/dbconfig/20210525-043129-marostegui.json
* 04:25 marostegui: Stop MySQL on dbstore1004 to clone dbstore1006 [[phab:T283125|T283125]]
* 04:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16181 and previous config saved to /var/cache/conftool/dbconfig/20210525-042410-root.json
* 02:06 James_F: 1.37.0-wmf.7 was branched at {{Gerrit|7ee6a2e8c12d5ec7c1c2ea063d64766c730d1e8b}} for [[phab:T281148|T281148]] by the TrainBranchBot
* 00:48 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:44 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 00:37 bstorm: labstore1007 downtimed for maintenance [[phab:T281045|T281045]]
== 2021-05-24 ==
* 21:43 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 19:32 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:23 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:20 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:15 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:33 urbanecm: Morning B&C deployment done
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e9cd344}}: Disable Education Program namespaces in hewiki ([[phab:T217137|T217137]]) (duration: 00m 56s)
* 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/skins/Vector/: {{Gerrit|1742532687b}}: Introduce the vector-body class ([[phab:T283206|T283206]]) (duration: 00m 57s)
* 17:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:39 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:35 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:17 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
* 16:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
* 16:14 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1022.eqiad.wmnet
* 15:55 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1022.eqiad.wmnet
* 15:52 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:47 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:45 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:41 twentyafterfour: deploying phabricator hotfix (and restarting php7.3-fpm on phab1001)
* 15:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1021.eqiad.wmnet
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16176 and previous config saved to /var/cache/conftool/dbconfig/20210524-150926-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16175 and previous config saved to /var/cache/conftool/dbconfig/20210524-145422-root.json
* 14:50 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1021.eqiad.wmnet
* 14:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1020.eqiad.wmnet
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16174 and previous config saved to /var/cache/conftool/dbconfig/20210524-143919-root.json
* 14:36 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1020.eqiad.wmnet
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16173 and previous config saved to /var/cache/conftool/dbconfig/20210524-142415-root.json
* 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:40 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:37 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:35 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 13:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 13:34 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:33 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 12:18 urbanecm: Uninstalling Flow from ruwiki: Delete all pages in NS2600 (Flow's Topic) in ruwiki via deleteBatch.php ([[phab:T282132|T282132]]; P16170)
* 12:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|47e040bc6bd678e4916e0a43ad1cba5b2096274a}}: ruwiki: Uninstall Flow ([[phab:T282132|T282132]]) (duration: 00m 56s)
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16169 and previous config saved to /var/cache/conftool/dbconfig/20210524-113711-marostegui.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16168 and previous config saved to /var/cache/conftool/dbconfig/20210524-112011-root.json
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1129e01745107638fee785830a1599c39379695b}}: Remove wgGEMentorshipMigrationStage ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16167 and previous config saved to /var/cache/conftool/dbconfig/20210524-110508-root.json
* 11:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|829c61d3dd8719546cb6f5690f75c3fad4b44aad}}: Deploy Growth features to newcomers on bgwiki, urwiki ([[phab:T280824|T280824]], [[phab:T280067|T280067]]) (duration: 00m 56s)
* 10:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 10:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
* 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16166 and previous config saved to /var/cache/conftool/dbconfig/20210524-105004-root.json
* 10:35 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 16s)
* 10:35 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16165 and previous config saved to /var/cache/conftool/dbconfig/20210524-103501-root.json
* 10:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@a9a577a]: (no justification provided) (duration: 00m 15s)
* 10:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@a9a577a]: (no justification provided)
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16164 and previous config saved to /var/cache/conftool/dbconfig/20210524-075958-root.json
* 07:49 XioNoX: bump Equinix Chicago RS max prefix
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16163 and previous config saved to /var/cache/conftool/dbconfig/20210524-074659-marostegui.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16162 and previous config saved to /var/cache/conftool/dbconfig/20210524-074454-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16161 and previous config saved to /var/cache/conftool/dbconfig/20210524-072950-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16160 and previous config saved to /var/cache/conftool/dbconfig/20210524-071447-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 - schema change', diff saved to https://phabricator.wikimedia.org/P16159 and previous config saved to /var/cache/conftool/dbconfig/20210524-052747-marostegui.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16158 and previous config saved to /var/cache/conftool/dbconfig/20210524-051345-root.json
* 05:09 legoktm: restarting mailman3 on lists1001, bounce runner crashed
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16157 and previous config saved to /var/cache/conftool/dbconfig/20210524-045841-root.json
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16156 and previous config saved to /var/cache/conftool/dbconfig/20210524-044337-root.json
* 04:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
* 04:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
* 04:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135', diff saved to https://phabricator.wikimedia.org/P16155 and previous config saved to /var/cache/conftool/dbconfig/20210524-043654-marostegui.json
* 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16154 and previous config saved to /var/cache/conftool/dbconfig/20210524-042834-root.json
== 2021-05-23 ==
* 14:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: EMERGENCY: {{Gerrit|f752f8b80c57a5b7e41b91873b3eac388535ac46}}: enwiktionary: Raise AF emergency disable treshold+count ([[phab:T283460|T283460]]) (duration: 00m 57s)
== 2021-05-22 ==
* 22:13 legoktm: reset 2FA for User:Yuvipanda on wikitech
* 21:07 ryankemper: [WDQS] Pooled `wdqs1006` (caught up on lag), de-pooled `wdqs1013` (8 hours)
* 16:35 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php cswiki --delete
== 2021-05-21 ==
* 22:32 bstorm: upload nfsd-ldap: 1.2+deb10u1 to buster-wikimedia [[phab:T283385|T283385]]
* 18:24 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:22 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:14 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:39 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:36 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:28 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 19s)
* 17:21 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:17 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:01 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:19 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:14 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:07 clarakosi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:20 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 12:59 reedy@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
* 12:56 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:34 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetdb-api
* 12:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:24 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=docker-registry
* 12:23 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=docker-registry
* 12:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16150 and previous config saved to /var/cache/conftool/dbconfig/20210521-122253-root.json
* 12:15 topranks: "Removing BGP peering sessions to LinkedIn AS14413 at AMS-IX / cr2-esams as they are no longer on the exchange."
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16149 and previous config saved to /var/cache/conftool/dbconfig/20210521-120749-root.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16148 and previous config saved to /var/cache/conftool/dbconfig/20210521-115246-root.json
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16147 and previous config saved to /var/cache/conftool/dbconfig/20210521-113742-root.json
* 10:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2008.codfw.wmnet
* 09:51 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2007.codfw.wmnet
* 09:41 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2006.codfw.wmnet
* 09:32 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2005.codfw.wmnet
* 09:32 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
* 09:28 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
* 09:26 gehel: depooling wdqs1006 to catch up on lag
* 09:24 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
* 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2008.codfw.wmnet
* 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2007.codfw.wmnet
* 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2006.codfw.wmnet
* 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
* 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
* 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
* 09:14 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2005.codfw.wmnet
* 08:56 kormat: deploying cumin2002 grants to production [[phab:T276589|T276589]]
* 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
* 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1001.wikimedia.org
* 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2004.wikimedia.org
* 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2003.wikimedia.org
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16146 and previous config saved to /var/cache/conftool/dbconfig/20210521-082009-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P16145 and previous config saved to /var/cache/conftool/dbconfig/20210521-080540-marostegui.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16144 and previous config saved to /var/cache/conftool/dbconfig/20210521-080506-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16143 and previous config saved to /var/cache/conftool/dbconfig/20210521-075002-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16142 and previous config saved to /var/cache/conftool/dbconfig/20210521-073459-root.json
* 06:32 moritzm: installing libspring-java security updates on stretch
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16141 and previous config saved to /var/cache/conftool/dbconfig/20210521-053027-root.json
* 05:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
* 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16140 and previous config saved to /var/cache/conftool/dbconfig/20210521-051523-root.json
* 05:14 moritzm: installing graphviz security updates on stretch
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16139 and previous config saved to /var/cache/conftool/dbconfig/20210521-050020-root.json
* 04:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1087.eqiad.wmnet
* 04:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1087.eqiad.wmnet
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P16138 and previous config saved to /var/cache/conftool/dbconfig/20210521-044717-marostegui.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16137 and previous config saved to /var/cache/conftool/dbconfig/20210521-044516-root.json
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P16136 and previous config saved to /var/cache/conftool/dbconfig/20210521-044339-marostegui.json
* 01:27 eileen: civicrm revision changed from {{Gerrit|35f5afb1b4}} to {{Gerrit|584b96452a}}, config revision is {{Gerrit|1f8d0a6bfa}}
* 01:18 eileen: civicrm revision changed from {{Gerrit|35f5afb1b4}} to {{Gerrit|584b96452a}}, config revision is {{Gerrit|1f8d0a6bfa}}
== 2021-05-20 ==
* 21:45 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:41 herron@cumin1001: START - Cookbook sre.dns.netbox
* 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:54 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mwlog1001.eqiad.wmnet
* 19:43 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
* 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16134 and previous config saved to /var/cache/conftool/dbconfig/20210520-193039-root.json
* 19:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16133 and previous config saved to /var/cache/conftool/dbconfig/20210520-191536-root.json
* 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.6
* 19:01 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16132 and previous config saved to /var/cache/conftool/dbconfig/20210520-190031-root.json
* 18:56 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16131 and previous config saved to /var/cache/conftool/dbconfig/20210520-184527-root.json
* 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: {{Gerrit|9edb3f4}}: Check if task is link-recommendation type before showing onboarding ([[phab:T282826|T282826]]) (duration: 01m 04s)
* 18:32 urbanecm@deploy1002: sync-file aborted: {{Gerrit|9edb3f4}}: Check if task is link-recommendation type before showing onboarding ([[phab:T282826|T282826]]) (duration: 00m 00s)
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: {{Gerrit|7fb129f}}: Check if task is link-recommendation type before showing onboarding ([[phab:T282826|T282826]]) (duration: 01m 05s)
* 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:27 godog: upgrade grafana to 8 beta 2 on grafana2001
* 15:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:46 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:44 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:43 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:33 moritzm: installing graphviz security updates on buster
* 15:31 ryankemper: [cloudelastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart *search*` to clear `Check systemd state` alert on `cloudelastic1003`
* 15:30 _joe_: test
* 15:23 moritzm: installing graphviz security updates on buster
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16128 and previous config saved to /var/cache/conftool/dbconfig/20210520-143825-marostegui.json
* 13:58 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
* 13:57 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
* 13:52 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/upload/UploadFromStash.php: UploadFromStash: convert default user from false to null - [[phab:T283196|T283196]] (duration: 01m 05s)
* 13:50 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/user/ActorStore.php: ActorStore: avoid throwing in case of invalid usernames [[phab:T283167|T283167]] (duration: 01m 05s)
* 13:41 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.0 (duration: 01m 20s)
* 13:39 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.0
* 12:30 kormat: Deploying wmfmariadbpy 0.7 [[phab:T283228|T283228]]
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16126 and previous config saved to /var/cache/conftool/dbconfig/20210520-113529-root.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16125 and previous config saved to /var/cache/conftool/dbconfig/20210520-112026-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16124 and previous config saved to /var/cache/conftool/dbconfig/20210520-110522-root.json
* 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16123 and previous config saved to /var/cache/conftool/dbconfig/20210520-105018-root.json
* 10:15 marostegui: Deploy schema change on s1 codfw, lag will appear in codfw [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 10:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:10 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16122 and previous config saved to /var/cache/conftool/dbconfig/20210520-093510-marostegui.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16121 and previous config saved to /var/cache/conftool/dbconfig/20210520-093257-root.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16120 and previous config saved to /var/cache/conftool/dbconfig/20210520-091754-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16119 and previous config saved to /var/cache/conftool/dbconfig/20210520-090250-root.json
* 08:56 godog: move icinga-wm to libera.chat
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16118 and previous config saved to /var/cache/conftool/dbconfig/20210520-084746-root.json
* 07:44 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 07:41 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16117 and previous config saved to /var/cache/conftool/dbconfig/20210520-071723-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16116 and previous config saved to /var/cache/conftool/dbconfig/20210520-071432-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16115 and previous config saved to /var/cache/conftool/dbconfig/20210520-065928-root.json
* 06:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 06:50 ryankemper: [[phab:T283223|T283223]] Write queue not draining fast enough for the next node to reboot, will finish reboot tomorrow
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16114 and previous config saved to /var/cache/conftool/dbconfig/20210520-064425-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16113 and previous config saved to /var/cache/conftool/dbconfig/20210520-062921-root.json
* 06:25 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/PageProps.php: Backport: [[gerrit:693028{{!}}PageProps: be prepared that PageIdentity is not proper title (T283170)]] (duration: 01m 06s)
* 06:08 elukey: powercycle ms-be2035 - no ssh available, no metrics since hours ago, I/O errors registered in the main tty on serial console
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16112 and previous config saved to /var/cache/conftool/dbconfig/20210520-054402-root.json
* 05:33 ryankemper: [[phab:T283223|T283223]] `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id [[phab:T283223|T283223]]` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
* 05:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16111 and previous config saved to /var/cache/conftool/dbconfig/20210520-052859-root.json
* 05:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 05:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1011.eqiad.wmnet
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16110 and previous config saved to /var/cache/conftool/dbconfig/20210520-051355-root.json
* 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1011.eqiad.wmnet
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16109 and previous config saved to /var/cache/conftool/dbconfig/20210520-050025-marostegui.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16108 and previous config saved to /var/cache/conftool/dbconfig/20210520-045919-marostegui.json
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16107 and previous config saved to /var/cache/conftool/dbconfig/20210520-045852-root.json
* 01:01 mutante: signing puppet certs for doh2001 and doh2002.wikimedia.org ([[phab:T283192|T283192]])
* 00:14 ejegg: updated fundraising CiviCRM from {{Gerrit|b3fb3c9cb0}} to {{Gerrit|35f5afb1b4}}
* 00:13 ejegg: updated payments-wiki from {{Gerrit|9f51ace546}} to {{Gerrit|6fac77f60e}}
== 2021-05-19 ==
* 22:44 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ sleep 3600 && mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=7200 --user=Lusccasdeutsch . # [[phab:T278856|T278856]] # 3 video files
* 22:29 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2002.wikimedia.org
* 22:27 Urbanecm: Start server-side upload for 1 video file ([[phab:T283186|T283186]])
* 22:25 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:22 Urbanecm: Start server-side upload for 3 video file ([[phab:T283102|T283102]], [[phab:T283054|T283054]])
* 22:22 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 22:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 22:12 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
* 22:11 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2001.wikimedia.org
* 22:09 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 11s)
* 22:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
* 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
* 22:00 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
* 21:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
* 21:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
* 21:56 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
* 21:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
* 21:51 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2001.wikimedia.org
* 21:44 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 20:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1125.eqiad.wmnet
* 19:40 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1125.eqiad.wmnet
* 18:30 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:23 herron@cumin1001: START - Cookbook sre.dns.netbox
* 18:23 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.6 [[phab:T281147|T281147]]
* 18:17 herron@cumin1001: START - Cookbook sre.dns.netbox
* 16:13 volans: uploaded debmonitor-client_0.3.0 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16103 and previous config saved to /var/cache/conftool/dbconfig/20210519-154808-root.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16102 and previous config saved to /var/cache/conftool/dbconfig/20210519-153304-root.json
* 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16101 and previous config saved to /var/cache/conftool/dbconfig/20210519-151800-root.json
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16100 and previous config saved to /var/cache/conftool/dbconfig/20210519-150257-root.json
* 13:33 kormat: uploaded wmfmariadb 0.7 packages to apt
* 13:29 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
* 13:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16099 and previous config saved to /var/cache/conftool/dbconfig/20210519-131920-marostegui.json
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16098 and previous config saved to /var/cache/conftool/dbconfig/20210519-131012-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16097 and previous config saved to /var/cache/conftool/dbconfig/20210519-125508-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16096 and previous config saved to /var/cache/conftool/dbconfig/20210519-124004-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16095 and previous config saved to /var/cache/conftool/dbconfig/20210519-122501-root.json
* 11:45 matthiasmullie: "EU backports done"
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16093 and previous config saved to /var/cache/conftool/dbconfig/20210519-114203-marostegui.json
* 11:41 mlitn@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules: Backport: [[gerrit:692653{{!}}Add a link: Set contentedtiable=false on mobile (T281771)]] (duration: 01m 06s)
* 11:14 mlitn@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:690691{{!}}Properly enable media change tags on Wikipedias (T266067 T282822)]] - part 2 (duration: 01m 04s)
* 11:13 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:690691{{!}}Properly enable media change tags on Wikipedias (T266067 T282822)]] - part 1 (duration: 01m 34s)
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16091 and previous config saved to /var/cache/conftool/dbconfig/20210519-092630-root.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16090 and previous config saved to /var/cache/conftool/dbconfig/20210519-091126-root.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16089 and previous config saved to /var/cache/conftool/dbconfig/20210519-085622-root.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16088 and previous config saved to /var/cache/conftool/dbconfig/20210519-084119-root.json
* 08:28 marostegui: Stop MySQL on db1175 to upgrade kernel and mysql
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16087 and previous config saved to /var/cache/conftool/dbconfig/20210519-082713-marostegui.json
* 08:13 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@f514dd9]: [[phab:T273847|T273847]] deploying export_queries_to_relforge - starttime bump (duration: 02m 24s)
* 08:10 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@f514dd9]: [[phab:T273847|T273847]] deploying export_queries_to_relforge - starttime bump
* 07:48 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@5740956]: [[phab:T273847|T273847]] deploying export_queries_to_relforge - index setting changes (duration: 02m 23s)
* 07:45 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@5740956]: [[phab:T273847|T273847]] deploying export_queries_to_relforge - index setting changes
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16086 and previous config saved to /var/cache/conftool/dbconfig/20210519-074530-root.json
* 07:42 XioNoX: roll SNMP: filter out default logical interfaces (.0) to all network devices - [[phab:T283060|T283060]]
* 07:38 godog: add 100G to prometheus/ops eqiad
* 07:31 marostegui: Deploy schema change on s3 codfw, lag will appear in codfw [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16085 and previous config saved to /var/cache/conftool/dbconfig/20210519-073027-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16084 and previous config saved to /var/cache/conftool/dbconfig/20210519-071523-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16083 and previous config saved to /var/cache/conftool/dbconfig/20210519-070019-root.json
* 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1010.eqiad.wmnet
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P16082 and previous config saved to /var/cache/conftool/dbconfig/20210519-064343-marostegui.json
* 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1010.eqiad.wmnet
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P16081 and previous config saved to /var/cache/conftool/dbconfig/20210519-063345-marostegui.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16080 and previous config saved to /var/cache/conftool/dbconfig/20210519-062824-root.json
* 06:18 Amir1: upgrading daily-article-l to mailman3 ([[phab:T282271|T282271]] [[phab:T280322|T280322]])
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16079 and previous config saved to /var/cache/conftool/dbconfig/20210519-061321-root.json
* 06:04 legoktm: restarted mailman3 on lists1001
* 06:01 legoktm: stopped mailman3 service on lists1001 for schema change
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16078 and previous config saved to /var/cache/conftool/dbconfig/20210519-055817-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16077 and previous config saved to /var/cache/conftool/dbconfig/20210519-055134-marostegui.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16076 and previous config saved to /var/cache/conftool/dbconfig/20210519-054313-root.json
* 05:17 marostegui: Compress a few tables on s3 [[phab:T283125|T283125]]
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109', diff saved to https://phabricator.wikimedia.org/P16075 and previous config saved to /var/cache/conftool/dbconfig/20210519-045857-marostegui.json
* 03:03 reedy@deploy1002: Synchronized php-1.37.0-wmf.5/includes/changetags/ChangeTagsRevisionList.php: [[phab:T283098|T283098]] [[phab:T283099|T283099]] (duration: 01m 05s)
* 03:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.6/includes/changetags/ChangeTagsRevisionList.php: [[phab:T283098|T283098]] [[phab:T283099|T283099]] (duration: 02m 35s)
== 2021-05-18 ==
* 18:40 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 05m 16s)
* 18:35 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
* 18:35 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 00m 07s)
* 18:34 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
* 18:33 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 15m 39s)
* 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3da5a8bc93c734e93c3012dace49ee1b881927a8}}: Update IP addresses for Wiki Education Dashboard exemptions ([[phab:T283096|T283096]]) (duration: 01m 06s)
* 18:26 urbanecm@deploy1002: Synchronized w/robots.php: {{Gerrit|8224e53f6da61bf037bb3e3ad1cf367bf9b5a588}}: robots.php: avoid using ContentHandler::getContentText() ([[phab:T268041|T268041]]) (duration: 01m 04s)
* 18:17 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
* 16:00 kormat@cumin1001: dbctl commit (dc=all): 'db1085 being decommissioned [[phab:T282096|T282096]]', diff saved to https://phabricator.wikimedia.org/P16073 and previous config saved to /var/cache/conftool/dbconfig/20210518-160053-kormat.json
* 15:30 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 05s)
* 15:23 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 07s)
* 14:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1085.eqiad.wmnet
* 14:38 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on all wikis - [[phab:T238138|T238138]] (duration: 01m 06s)
* 14:32 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.6
* 14:32 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1085.eqiad.wmnet
* 14:21 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.6 (duration: 79m 07s)
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16067 and previous config saved to /var/cache/conftool/dbconfig/20210518-142042-root.json
* 14:17 moritzm: installing remaining postgresql-11 updates (client tools and libs, servers already done)
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16066 and previous config saved to /var/cache/conftool/dbconfig/20210518-140538-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16065 and previous config saved to /var/cache/conftool/dbconfig/20210518-135034-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16064 and previous config saved to /var/cache/conftool/dbconfig/20210518-133531-root.json
* 13:02 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.6
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P16063 and previous config saved to /var/cache/conftool/dbconfig/20210518-125945-marostegui.json
* 12:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
* 12:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16062 and previous config saved to /var/cache/conftool/dbconfig/20210518-124247-root.json
* 12:40 Krinkle: krinkle@mw1002 purge-parsercache-now.php on pc1010 (spare, depooled), ref P16060, [[phab:T280605|T280605]], [[phab:T282761|T282761]]
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16061 and previous config saved to /var/cache/conftool/dbconfig/20210518-122744-root.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16059 and previous config saved to /var/cache/conftool/dbconfig/20210518-121240-root.json
* 12:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.4 (duration: 01m 28s)
* 12:07 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.3 (duration: 01m 50s)
* 12:04 hashar@deploy1002: clean aborted: Pruned MediaWiki: 1.37.0-wmf.1 (duration: 01m 16s)
* 12:04 hashar: scap clean 1.37.0-wmf.1  1.37.0-wmf.3 and 1.37.0-wmf.4  # [[phab:T281147|T281147]]
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16058 and previous config saved to /var/cache/conftool/dbconfig/20210518-115736-root.json
* 11:41 moritzm: upgrading idp2001 to Java 11.0.11
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P16057 and previous config saved to /var/cache/conftool/dbconfig/20210518-112942-marostegui.json
* 10:53 moritzm: upgrade idp-test to OpenJDK 11.0.11 [[phab:T281345|T281345]]
* 10:27 moritzm: installing OpenJDK updates on Hadoop/Druid/AQS/kafka-Jumbo
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16056 and previous config saved to /var/cache/conftool/dbconfig/20210518-102607-root.json
* 10:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 10:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16055 and previous config saved to /var/cache/conftool/dbconfig/20210518-101104-root.json
* 10:03 kormat: stopping mariadb on db1085 [[phab:T282096|T282096]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16054 and previous config saved to /var/cache/conftool/dbconfig/20210518-095600-root.json
* 09:47 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P16053 and previous config saved to /var/cache/conftool/dbconfig/20210518-094732-kormat.json
* 09:44 XioNoX: 👍
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16052 and previous config saved to /var/cache/conftool/dbconfig/20210518-094056-root.json
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1087 from dbctl [[phab:T282093|T282093]]', diff saved to https://phabricator.wikimedia.org/P16051 and previous config saved to /var/cache/conftool/dbconfig/20210518-093552-marostegui.json
* 09:32 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P16050 and previous config saved to /var/cache/conftool/dbconfig/20210518-093228-kormat.json
* 09:30 topranks: add peering sessions to AS8708 RCS & RDS on cr2-esams
* 09:27 XioNoX: push test SNMP filter config on asw-a-codfw - [[phab:T283060|T283060]]
* 09:17 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P16049 and previous config saved to /var/cache/conftool/dbconfig/20210518-091725-kormat.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P16048 and previous config saved to /var/cache/conftool/dbconfig/20210518-091717-marostegui.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16047 and previous config saved to /var/cache/conftool/dbconfig/20210518-091702-root.json
* 09:04 kormat@cumin1001: dbctl commit (dc=all): 'Set db1131 to weight 400 in s6/eqiad [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P16046 and previous config saved to /var/cache/conftool/dbconfig/20210518-090449-kormat.json
* 09:02 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P16045 and previous config saved to /var/cache/conftool/dbconfig/20210518-090215-kormat.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16044 and previous config saved to /var/cache/conftool/dbconfig/20210518-090159-root.json
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'Remove s6 eqiad primary from 'api' group [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P16043 and previous config saved to /var/cache/conftool/dbconfig/20210518-090156-kormat.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16042 and previous config saved to /var/cache/conftool/dbconfig/20210518-084643-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16041 and previous config saved to /var/cache/conftool/dbconfig/20210518-083139-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P16040 and previous config saved to /var/cache/conftool/dbconfig/20210518-075532-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16039 and previous config saved to /var/cache/conftool/dbconfig/20210518-075458-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16038 and previous config saved to /var/cache/conftool/dbconfig/20210518-073955-root.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16037 and previous config saved to /var/cache/conftool/dbconfig/20210518-072451-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16036 and previous config saved to /var/cache/conftool/dbconfig/20210518-070947-root.json
* 07:06 marostegui: Deploy schema change on s4 codfw, lag will appear in codfw [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:54 XioNoX: Homerify cloudsw ospf
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P16035 and previous config saved to /var/cache/conftool/dbconfig/20210518-064426-marostegui.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1083.eqiad.wmnet
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16034 and previous config saved to /var/cache/conftool/dbconfig/20210518-064033-root.json
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1083.eqiad.wmnet
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1083 from dbctl [[phab:T281445|T281445]]', diff saved to https://phabricator.wikimedia.org/P16033 and previous config saved to /var/cache/conftool/dbconfig/20210518-062947-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16032 and previous config saved to /var/cache/conftool/dbconfig/20210518-062529-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16031 and previous config saved to /var/cache/conftool/dbconfig/20210518-061026-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16030 and previous config saved to /var/cache/conftool/dbconfig/20210518-055522-root.json
* 05:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1009.eqiad.wmnet
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1009.eqiad.wmnet
* 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
* 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P16029 and previous config saved to /var/cache/conftool/dbconfig/20210518-052324-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16028 and previous config saved to /var/cache/conftool/dbconfig/20210518-050949-marostegui.json
* 05:06 marostegui: Restart db1115 mysql
* 00:56 eileen: civicrm revision changed from {{Gerrit|38ac15233f}} to {{Gerrit|b3fb3c9cb0}}, config revision is {{Gerrit|1f8d0a6bfa}}
== 2021-05-17 ==
* 23:33 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 01s)
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 55s)
* 21:46 sbassett: Deployed security patch (and ran scap sync-l10n) for [[phab:T260865|T260865]]
* 19:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize WikidataCompletionSearchClicks Event Platform migration - [[phab:T282140|T282140]] (duration: 00m 58s)
* 19:13 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to Event Platform on group 0 and group 1 - [[phab:T238138|T238138]] (duration: 00m 59s)
* 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/skins/Vector/includes/FeatureManagement/Requirements/LanguageInHeaderTreatmentRequirement.php: {{Gerrit|e180b99}}: Allow `languageinheader` query param to fully control treatment of languages ([[phab:T282543|T282543]]) (duration: 00m 58s)
* 18:19 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|c30f92b5}}: Remove expired throttle rule (duration: 00m 59s)
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16022 and previous config saved to /var/cache/conftool/dbconfig/20210517-165322-root.json
* 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16021 and previous config saved to /var/cache/conftool/dbconfig/20210517-163819-root.json
* 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16020 and previous config saved to /var/cache/conftool/dbconfig/20210517-162315-root.json
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16019 and previous config saved to /var/cache/conftool/dbconfig/20210517-160811-root.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16018 and previous config saved to /var/cache/conftool/dbconfig/20210517-153311-root.json
* 15:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.5
* 15:26 elukey@deploy1002: Finished deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - [[phab:T257359|T257359]] (duration: 19m 48s)
* 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16017 and previous config saved to /var/cache/conftool/dbconfig/20210517-151807-root.json
* 15:06 elukey@deploy1002: Started deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - [[phab:T257359|T257359]]
* 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16016 and previous config saved to /var/cache/conftool/dbconfig/20210517-150303-root.json
* 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16015 and previous config saved to /var/cache/conftool/dbconfig/20210517-144800-root.json
* 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16014 and previous config saved to /var/cache/conftool/dbconfig/20210517-141737-marostegui.json
* 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16013 and previous config saved to /var/cache/conftool/dbconfig/20210517-141627-root.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16012 and previous config saved to /var/cache/conftool/dbconfig/20210517-140438-root.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16011 and previous config saved to /var/cache/conftool/dbconfig/20210517-140435-root.json
* 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16010 and previous config saved to /var/cache/conftool/dbconfig/20210517-140123-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16009 and previous config saved to /var/cache/conftool/dbconfig/20210517-134934-root.json
* 13:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16008 and previous config saved to /var/cache/conftool/dbconfig/20210517-134931-root.json
* 13:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16007 and previous config saved to /var/cache/conftool/dbconfig/20210517-134619-root.json
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16006 and previous config saved to /var/cache/conftool/dbconfig/20210517-133431-root.json
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16005 and previous config saved to /var/cache/conftool/dbconfig/20210517-133427-root.json
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16004 and previous config saved to /var/cache/conftool/dbconfig/20210517-133116-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16003 and previous config saved to /var/cache/conftool/dbconfig/20210517-131927-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16002 and previous config saved to /var/cache/conftool/dbconfig/20210517-131924-root.json
* 13:10 marostegui: Upgrade kernel and mysql (10.4.19) on  db1144:3314, db1144:3315
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314, db1144:3315 for kernel and mysql upgrade', diff saved to https://phabricator.wikimedia.org/P16001 and previous config saved to /var/cache/conftool/dbconfig/20210517-130935-marostegui.json
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16000 and previous config saved to /var/cache/conftool/dbconfig/20210517-125742-marostegui.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15999 and previous config saved to /var/cache/conftool/dbconfig/20210517-123548-root.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15998 and previous config saved to /var/cache/conftool/dbconfig/20210517-122045-root.json
* 12:08 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:07 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15997 and previous config saved to /var/cache/conftool/dbconfig/20210517-120541-root.json
* 12:04 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 11:55 marostegui: Deploy schema change on s8 codfw, lag will appear in codfw [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15996 and previous config saved to /var/cache/conftool/dbconfig/20210517-115037-root.json
* 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswikibooks --fix
* 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswiki --fix
* 11:49 Urbanecm: 11:49:22 Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a73fe2d}}: Make the Malaysian talk namespaces names consistent (duration: 01m 08s)
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
* 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
* 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1e06f83293be63bd32703731ef1386e63d4ae94a}}: Enable SandboxLink at azwiki ([[phab:T282954|T282954]]) (duration: 01m 08s)
* 11:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|32e43439c88147439109403ea2805da648fef97f}}: urwiki: Grant `editprotected` to eliminators ([[phab:T281274|T281274]]) (duration: 01m 08s)
* 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|36d29a667bacebc880632c6a6a3614f4b1f5aa2e}}: Enable NewUserMessage on ptwikinews ([[phab:T282845|T282845]]) (duration: 01m 09s)
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P15995 and previous config saved to /var/cache/conftool/dbconfig/20210517-111343-marostegui.json
* 11:07 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/<nowiki>{</nowiki>bnwiki,bnwiki-1.5x,bnwiki-2x<nowiki>}</nowiki>.png ([[phab:T282886|T282886]])
* 11:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
* 11:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
* 11:06 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|b1da7aa0517074cfa74c52c3889e4b185828d5c8}}: Update bnwiki project logo ([[phab:T282886|T282886]]) (duration: 01m 42s)
* 11:03 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # [[phab:T278856|T278856]]
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15994 and previous config saved to /var/cache/conftool/dbconfig/20210517-103823-root.json
* 10:37 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:692281{{!}} Bumping portals to master (T128546)]] (duration: 01m 07s)
* 10:36 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:692281{{!}} Bumping portals to master (T128546)]] (duration: 01m 08s)
* 10:30 moritzm: installing postgresql-11 security updates
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15993 and previous config saved to /var/cache/conftool/dbconfig/20210517-102319-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15992 and previous config saved to /var/cache/conftool/dbconfig/20210517-100815-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15991 and previous config saved to /var/cache/conftool/dbconfig/20210517-095312-root.json
* 09:43 hashar: Restarted CI Jenkins to update the instant-messaging and ircbot plugins # [[phab:T271122|T271122]]
* 09:33 moritzm: installing libimage-exiftool-perl security updates
* 09:29 topranks: push CR691140 to eqiad and codfw core routers - [[phab:T282809|T282809]]
* 09:18 hashar: Restarting CI Jenkins to upgrade the Gearman plugin # [[phab:T281737|T281737]]
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P15990 and previous config saved to /var/cache/conftool/dbconfig/20210517-091636-marostegui.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15989 and previous config saved to /var/cache/conftool/dbconfig/20210517-091604-root.json
* 09:06 ema: cp_eqsin: run confd-reload-vcl manually to fix /var/run/reload-vcl-state [[phab:T282880|T282880]]
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15988 and previous config saved to /var/cache/conftool/dbconfig/20210517-090101-root.json
* 08:52 vgutierrez: pool cp5016
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15987 and previous config saved to /var/cache/conftool/dbconfig/20210517-084557-root.json
* 08:45 vgutierrez: depool cp5016
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15986 and previous config saved to /var/cache/conftool/dbconfig/20210517-083053-root.json
* 08:28 Urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from global_group_restrictions where ggr_group="Indic_Bots"; # [[phab:T282968|T282968]]
* 08:26 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|93e61f7}}: Use svwiki 20th anniversary logos ([[phab:T282389|T282389]]) (duration: 01m 08s)
* 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|0f356a3}}: Add svwiki 20th anniversary logos ([[phab:T282389|T282389]]) (duration: 01m 12s)
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15985 and previous config saved to /var/cache/conftool/dbconfig/20210517-061232-marostegui.json
* 06:01 kormat: restarting mariadb on db1131 to pick up report_host [[phab:T266483|T266483]]
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15984 and previous config saved to /var/cache/conftool/dbconfig/20210517-055556-root.json
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1079.eqiad.wmnet
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15983 and previous config saved to /var/cache/conftool/dbconfig/20210517-054053-root.json
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1079.eqiad.wmnet
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15982 and previous config saved to /var/cache/conftool/dbconfig/20210517-052549-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1079 from dbctl [[phab:T282079|T282079]]', diff saved to https://phabricator.wikimedia.org/P15981 and previous config saved to /var/cache/conftool/dbconfig/20210517-051728-marostegui.json
* 05:13 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1131 until it's reimaged to buster [[phab:T282124|T282124]]', diff saved to https://phabricator.wikimedia.org/P15980 and previous config saved to /var/cache/conftool/dbconfig/20210517-051312-kormat.json
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15979 and previous config saved to /var/cache/conftool/dbconfig/20210517-051045-root.json
* 05:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1173 to s6 master and set section read-write [[phab:T282124|T282124]]', diff saved to https://phabricator.wikimedia.org/P15978 and previous config saved to /var/cache/conftool/dbconfig/20210517-050740-kormat.json
* 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T282124|T282124]]', diff saved to https://phabricator.wikimedia.org/P15977 and previous config saved to /var/cache/conftool/dbconfig/20210517-050526-kormat.json
* 05:05 kormat: Starting s6 eqiad failover from db1131 to db1173 - [[phab:T282124|T282124]]
* 04:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
* 04:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
* 04:46 kormat@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 [[phab:T282124|T282124]]', diff saved to https://phabricator.wikimedia.org/P15976 and previous config saved to /var/cache/conftool/dbconfig/20210517-044657-kormat.json
* 04:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Master switchover s6 [[phab:T282124|T282124]]
* 04:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Master switchover s6 [[phab:T282124|T282124]]
* 04:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15975 and previous config saved to /var/cache/conftool/dbconfig/20210517-043551-marostegui.json
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124', diff saved to https://phabricator.wikimedia.org/P15974 and previous config saved to /var/cache/conftool/dbconfig/20210517-043148-marostegui.json
* 02:10 legoktm: uninstalled python3-dbg on lists1001
* 01:31 legoktm: restarted mailman3-web
* 00:13 legoktm: installing python3-dbg on lists1001
== 2021-05-16 ==
* 22:45 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=tawiki wikilove # [[phab:T280326|T280326]]
* 20:46 legoktm: restarted mailman3-web
* 19:38 legoktm: restarted mailman3-web
* 17:29 Amir1: restart mailman3-web
* 02:39 legoktm: restarting mailman3-web on lists1001 again
* 00:53 legoktm: restarted mailman3-web on lists1001, uwsgi looked like it got stuck, consuming all CPU/memory
== 2021-05-15 ==
* 12:33 Amir1: set fr_quality to 0 for all revisions on several wikis ([[phab:T279761|T279761]])
* 06:54 Amir1: migrating most of last mailing lists of [[phab:T280322|T280322]]
== 2021-05-14 ==
* 20:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1002.eqiad.wmnet
* 20:32 mutante: people1002 - decom'ing - please use people1003 and see list mail
* 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1002.eqiad.wmnet
* 18:58 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:58 cdanis@cumin1001: START - Cookbook sre.network.cf
* 18:39 cdanis: ✔️ cdanis@install1003.wikimedia.org ~ 🕝☕ sudo systemctl restart squid.service
* 18:14 mutante: people1003/people2002: awk -F: '$6 ~ "^\/home" <nowiki>{</nowiki>print $1,$6<nowiki>}</nowiki>' /etc/passwd  {{!}} while read line ; do user=$<nowiki>{</nowiki>line% *<nowiki>}</nowiki>; dir=$<nowiki>{</nowiki>line#* <nowiki>}</nowiki>; sudo mkdir -p $<nowiki>{</nowiki>dir<nowiki>}</nowiki>/public_html; sudo chown $user $<nowiki>{</nowiki>dir<nowiki>}</nowiki>/public_html; done (courtesy of Jbond)
* 17:49 bblack: install1003 - restored normal resolv.conf + re-enabled+ran puppet
* 17:41 bblack: install1003 - restart squid
* 17:35 bblack: install1003 - puppet disabled and /etc/resolv.conf manually patched over to deal with a current issue
* 17:25 cdanis: rolled back cr1-eqiad/cr2-eqiad interface disables [[phab:T282881|T282881]]
* 17:10 cdanis: cdanis@re0.cr1-eqiad# set interfaces gr-3/3/0.1 disable  # [[phab:T282881|T282881]]
* 17:03 cdanis: cdanis@re0.cr2-eqiad# set interfaces gr-4/3/0.2 disable  # [[phab:T282881|T282881]]
* 15:22 cdanis@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:22 cdanis@cumin2002: START - Cookbook sre.network.cf
* 15:05 Urbanecm: Start server-side upload for 1 video file ([[phab:T282874|T282874]])
* 14:09 andrew@deploy1002: Finished deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard (duration: 04m 15s)
* 14:04 andrew@deploy1002: Started deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard
* 12:54 bblack: re-running puppet agent on cp5*
* 12:19 jbond42: run puppet on CP servers
* 04:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisionlist/RevisionItem.php: fix deprecation warning [[phab:T282825|T282825]] (duration: 01m 07s)
* 04:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisiondelete/RevDelRevisionItem.php: fix deprecation warning [[phab:T282825|T282825]] (duration: 01m 07s)
* 04:18 ariel@deploy1002: Finished deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path (duration: 00m 03s)
* 04:18 ariel@deploy1002: Started deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path
* 03:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/MapSources/includes/specials/MapSourcesPage.php: fix PHP notice [[phab:T282833|T282833]] (duration: 01m 07s)
* 03:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/WikiPage.php: [[phab:T282844|T282844]] (duration: 01m 06s)
* 03:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/PageArchive.php: [[phab:T282844|T282844]] (duration: 01m 07s)
* 03:16 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/Revision/RevisionArchiveRecord.php: fix DeletedContributions breakage [[phab:T282844|T282844]] (duration: 01m 07s)
* 03:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/logging/LogEventsList.php: fix PHP notice [[phab:T282834|T282834]] (duration: 01m 08s)
* 00:39 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
== 2021-05-13 ==
* 23:53 mutante: [sodium:~] $ sudo systemctl start update-ubuntu-mirror.service
* 23:50 mutante: [sodium:~] $ sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
* 23:22 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikimediaEvents: Backport: [[gerrit:690081{{!}}Fix "final_state: vector" bug in VectorPrefDiffInstrumentation (T261842)]] (duration: 01m 07s)
* 23:11 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:686700{{!}}Enable WikiLove extension on tawiki (T280326)]] (duration: 01m 07s)
* 23:10 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
* 23:09 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
* 23:09 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: {{Gerrit|9dc74e45579c9b868571529171421c4bf7de41fa}}: Revert "Enable media change tags on wikipedias" ([[phab:T266067|T266067]], [[phab:T282822|T282822]]) (duration: 01m 07s)
* 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:43 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
* 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
* 19:39 dancy@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GeoData/includes/Hooks.php: Backport: [[gerrit:690078{{!}}Make sure mId exists (T282735)]] (duration: 01m 08s)
* 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|80e5b9d}}: {{Gerrit|cd113a7}}: Enable structured_task/article/link_suggestion_interaction schema ([[phab:T278177|T278177]]) (duration: 01m 06s)
* 18:59 Urbanecm: Morning B&C is going to take few more minutes
* 18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people2001.codfw.wmnet
* 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: {{Gerrit|0856ae1}}: {{Gerrit|ca52e78}}: GrowthExperiments backports ([[phab:T282711|T282711]], [[phab:T282175|T282175]]) (duration: 01m 08s)
* 18:26 mutante: people2001 is going down - people1003 (eqiad) and people2002 (codfw) are your replacements on bullseye
* 18:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people2001.codfw.wmnet
* 18:22 Urbanecm: Start server-side upload for 2 video files ([[phab:T282643|T282643]], [[phab:T282644|T282644]])
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4cd6a782a946e121f5f8301e2649be8d338baaf8}}: Growth features: Push elwiki and cawiki out of dark mode ([[phab:T280673|T280673]]; [[phab:T280172|T280172]]) (duration: 01m 07s)
* 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|04eb9d30b069e60004a42fcb128a958a24aee229}}: Enable media change tags on wikipedias ([[phab:T266067|T266067]]) (duration: 01m 07s)
* 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b3300c3}}: {{Gerrit|59c8448}}: Enable Extension:MediaSearch on (test)commons ([[phab:T265939|T265939]]) (duration: 01m 08s)
* 17:20 andrew@deploy1002: Finished deploy [horizon/deploy@3d160f6]: Adding Database dashboards (duration: 04m 08s)
* 17:16 andrew@deploy1002: Started deploy [horizon/deploy@3d160f6]: Adding Database dashboards
* 16:36 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:690558{{!}}ProductionServices: add poolcounter1005 back to config (T273278)]] (duration: 01m 07s)
* 16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
* 16:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
* 16:24 effie: rebooting poolcounter1005
* 16:09 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:690557{{!}}ProductionServices: poolcounter1005 will be rebooted for updates (T273278)]] (duration: 01m 07s)
* 15:58 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:690530{{!}}ProductionServices: add poolcounter1004 back to config (T273278)]] (duration: 01m 07s)
* 15:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
* 15:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
* 15:46 effie: restarting poolcounter1004
* 15:27 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:688239{{!}}ProductionServices: poolcounter1004 will be rebooted for updates (T273278)]] (duration: 01m 08s)
* 14:49 Urbanecm: Start server-side upload for 1 video file ([[phab:T282785|T282785]])
* 14:07 Urbanecm: Start server-side upload for 3 video files ([[phab:T282558|T282558]], [[phab:T282556|T282556]])
* 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments: Backport: instrumentation patches ([[gerrit:690070{{!}}]] [[gerrit:690071{{!}}]] [[gerrit:690072{{!}}]] [[gerrit:690073{{!}}]]) ([[phab:T278116|T278116]] [[phab:T278117|T278117]] [[phab:T278114|T278114]] [[phab:T278177|T278177]] [[phab:T278487|T278487]] [[phab:T278112|T278112]] [[phab:T278111|T278111]] [[phab:T278118|T278118]]) (duration: 01m 09s)
* 11:00 hnowlan: deleting packages still referenced by jessie components: `sudo -i reprepro clearvanished --delete`
* 10:46 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:40 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:31 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:21 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 07:43 kevinbazira@deploy1002: Finished deploy [ores/deploy@8fd23ed]: Regular ORES Deployment [[phab:T278723|T278723]] (duration: 32m 50s)
* 07:10 kevinbazira@deploy1002: Started deploy [ores/deploy@8fd23ed]: Regular ORES Deployment [[phab:T278723|T278723]]
* 05:54 _joe_: running docker image prune on contint1001, which has 722 unlinked images stored in its docker daemon
* 01:20 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
== 2021-05-12 ==
* 23:48 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/WikiEditor/includes/WikiEditorHooks.php: 2f6af514c49d47bbec5ce51f9f7263015e039003? PHP VisualEditorFeatureUse logging: properly record session id ([[phab:T281409|T281409]]) (duration: 01m 07s)
* 23:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikiEditor/includes/WikiEditorHooks.php: {{Gerrit|ef4139628a36eb8b747c610c8d769a802faf2fc3}}: PHP VisualEditorFeatureUse logging: properly record session id ([[phab:T281409|T281409]]) (duration: 01m 08s)
* 23:27 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
* 23:27 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 22:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 21:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 21:54 ryankemper: [[phab:T280382|T280382]] `wdqs1012.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 20:57 ottomata: starting new drop_event data purge job to drop all event data older than 90 days in the Hive event database - [[phab:T273789|T273789]]
* 20:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 19:25 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 19:15 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 19:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:11 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
* 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
* 19:10 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 19:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:07 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - [[phab:T280563|T280563]]
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
* 19:05 ryankemper: [[phab:T280382|T280382]] [[phab:T281437|T281437]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2007.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
* 19:00 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin2001` tmux session `elastic_restarts`
* 19:00 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - [[phab:T280563|T280563]]
* 18:59 ryankemper: [Elastic] Restarted `*search*` services on `elastic2058`
* 18:48 mutante: rsyncing home dirs of people1003 over to people2002 as well ([[phab:T280989|T280989]])
* 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: {{Gerrit|3999be113362b4cdf0aecb3597bbe42ea06cec7a}}: Add Link: refine exclusion rules for finding link text matches (duration: 01m 08s)
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eb65aff2eccec58f14721958f2b9218266eedeb4}}: Update wordmark and tagline for kawiki ([[phab:T278251|T278251]]; 2/2) (duration: 01m 09s)
* 18:26 urbanecm@deploy1002: Synchronized static/images/mobile/: {{Gerrit|eb65aff2eccec58f14721958f2b9218266eedeb4}}: Update wordmark and tagline for kawiki ([[phab:T278251|T278251]]; 1/2) (duration: 01m 06s)
* 18:25 urbanecm@deploy1002: sync-file aborted: {{Gerrit|eb65aff2eccec58f14721958f2b9218266eedeb4}}: Update wordmark and tagline for kawiki ([[phab:T278251|T278251]]) (duration: 00m 00s)
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0cd3297b79e92bd39c0cebd1591a14591f57ecb0}}: Disable Education Program namespaces in cswiki ([[phab:T282691|T282691]]) (duration: 01m 15s)
* 18:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/includes/skins/SkinTemplate.php: {{Gerrit|7f1491337d1eef2629fea8031f066c490ea86987}}: Modern keys must be unset ([[phab:T282646|T282646]]) (duration: 01m 08s)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11defd4181103598222df34d9f1aa6dc428f66cd}}:  enwiki: Growth features: Change help panel links ([[phab:T281896|T281896]]) (duration: 01m 23s)
* 16:15 hnowlan: including envoyproxy_1.15.5-1_amd64.changes with reprepro
* 15:51 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudnet2003-dev.codfw.wmnet
* 14:45 aborrero@cumin2001: START - Cookbook sre.hosts.decommission for hosts cloudnet2003-dev.codfw.wmnet
* 14:02 marostegui: Upgrad mysql on clouddb1015
* 14:01 marostegui: Upgraded mysql on clouddb1014
* 13:57 kormat: uploaded wmfmariadbpy 0.6.1 for bullseye
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15950 and previous config saved to /var/cache/conftool/dbconfig/20210512-133248-root.json
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15949 and previous config saved to /var/cache/conftool/dbconfig/20210512-131745-root.json
* 13:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 13:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 13:06 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
* 13:05 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15948 and previous config saved to /var/cache/conftool/dbconfig/20210512-130239-root.json
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15947 and previous config saved to /var/cache/conftool/dbconfig/20210512-124736-root.json
* 12:44 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 12:42 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P15946 and previous config saved to /var/cache/conftool/dbconfig/20210512-121004-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15945 and previous config saved to /var/cache/conftool/dbconfig/20210512-120746-root.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15944 and previous config saved to /var/cache/conftool/dbconfig/20210512-115242-root.json
* 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: {{Gerrit|6cc2530}}: {{Gerrit|c268d08}}: {{Gerrit|b89592e}}: {{Gerrit|7620953}}: {{Gerrit|8fd7610}}: GrowthExperiments backports (duration: 01m 17s)
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15943 and previous config saved to /var/cache/conftool/dbconfig/20210512-113737-root.json
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15942 and previous config saved to /var/cache/conftool/dbconfig/20210512-112234-root.json
* 11:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9939edb27f8a43def7fefe1eae734b078dea003a}}: zhwikinews: Allow sysops to grant/revoke transwiki group ([[phab:T273405|T273405]]) (duration: 02m 17s)
* 10:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: [[phab:T276922|T276922]]
* 10:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: [[phab:T276922|T276922]]
* 10:32 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
* 10:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
* 10:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
* 10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
* 10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
* 10:01 effie: reboot poolcounter2003 and poolcounter2004
* 09:55 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15940 and previous config saved to /var/cache/conftool/dbconfig/20210512-093333-marostegui.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15939 and previous config saved to /var/cache/conftool/dbconfig/20210512-093308-root.json
* 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1074.eqiad.wmnet
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15938 and previous config saved to /var/cache/conftool/dbconfig/20210512-091804-root.json
* 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1074.eqiad.wmnet
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15937 and previous config saved to /var/cache/conftool/dbconfig/20210512-090301-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15936 and previous config saved to /var/cache/conftool/dbconfig/20210512-084757-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1074 from dbctl [[phab:T281959|T281959]]', diff saved to https://phabricator.wikimedia.org/P15935 and previous config saved to /var/cache/conftool/dbconfig/20210512-084755-marostegui.json
* 08:23 jbond42: rolling restart of ats
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15934 and previous config saved to /var/cache/conftool/dbconfig/20210512-071017-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15933 and previous config saved to /var/cache/conftool/dbconfig/20210512-070202-marostegui.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15932 and previous config saved to /var/cache/conftool/dbconfig/20210512-065513-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15931 and previous config saved to /var/cache/conftool/dbconfig/20210512-064009-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15930 and previous config saved to /var/cache/conftool/dbconfig/20210512-062506-root.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P15929 and previous config saved to /var/cache/conftool/dbconfig/20210512-062118-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2121 and db2108 in s7 [[phab:T282535|T282535]]', diff saved to https://phabricator.wikimedia.org/P15928 and previous config saved to /var/cache/conftool/dbconfig/20210512-062046-marostegui.json
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15927 and previous config saved to /var/cache/conftool/dbconfig/20210512-061702-root.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Move db2148 to also serve vslow in s2 [[phab:T282535|T282535]]', diff saved to https://phabricator.wikimedia.org/P15926 and previous config saved to /var/cache/conftool/dbconfig/20210512-060817-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15925 and previous config saved to /var/cache/conftool/dbconfig/20210512-060158-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15924 and previous config saved to /var/cache/conftool/dbconfig/20210512-054655-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15923 and previous config saved to /var/cache/conftool/dbconfig/20210512-053151-root.json
* 05:00 marostegui: Stop MySQL on labsdb1009 labsdb1010 labsdb1011 [[phab:T282524|T282524]] [[phab:T282523|T282523]] [[phab:T282522|T282522]]
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P15922 and previous config saved to /var/cache/conftool/dbconfig/20210512-044728-marostegui.json
* 04:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 [[phab:T282535|T282535]]', diff saved to https://phabricator.wikimedia.org/P15920 and previous config saved to /var/cache/conftool/dbconfig/20210512-044222-marostegui.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 [[phab:T282535|T282535]]', diff saved to https://phabricator.wikimedia.org/P15919 and previous config saved to /var/cache/conftool/dbconfig/20210512-044109-marostegui.json
* 04:38 marostegui: Drop testing mailman3 databases [[phab:T281548|T281548]]
* 04:36 Amir1: importing archives of wikitech-l ([[phab:T280322|T280322]])
* 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
* 01:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
* 01:35 mutante: people2002 - created new VM resembling people2001, signed puppet cert request, initial puppet run [[phab:T280989|T280989]]
* 01:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php: [[phab:T282183|T282183]] fix hidemyself in RC and watchlist (duration: 01m 08s)
* 01:17 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specialpage/ChangesListSpecialPage.php: [[phab:T282183|T282183]] fix hidemyself in RC and watchlist (duration: 01m 16s)
* 00:54 mutante: made public_html dirs on people1002 readonly to make it obvious it is not the active backend anymore
* 00:51 mutante: [people1002:/home] $ sudo find . -type d -name public_html -exec chmod 555 <nowiki>{</nowiki><nowiki>}</nowiki> \;
== 2021-05-11 ==
* 23:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ec37795eba5faa0c1a1dddb29504941205e155b4}}: Change namespace names and aliases on tiwiki and tiwiktionary ([[phab:T263840|T263840]]) (duration: 01m 07s)
* 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5bc40acfa6514f7940af0a6cef9974140680f4b9}}: ptwiki: Use celebration logos in new vector ([[phab:T281925|T281925]]) (duration: 01m 06s)
* 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eac843a69574feebbf962959e5eb9811a2a83bc4}}: Make DT source mode toolbar available as beta on all wikis ([[phab:T279124|T279124]]) (duration: 01m 12s)
* 23:06 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-pt-20.png: {{Gerrit|60e6e4e960ee6cb31df9ce08fdeaedb647ce3afb}}: ptwiki: Add wikipedia-pt-20.png ([[phab:T281925|T281925]]) (duration: 01m 08s)
* 23:02 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|e35199baf01f423015905b8fec9e419ed3529787}}: Adding square logo and wordmark for ptwiki 20 years celebration ([[phab:T281925|T281925]]) (duration: 01m 50s)
* 22:14 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lists1002.wikimedia.org
* 22:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts lists1002.wikimedia.org
* 21:37 Urbanecm: Start server-side upload for 3 video files ([[phab:T282566|T282566]], [[phab:T282565|T282565]], [[phab:T282559|T282559]])
* 21:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
* 21:34 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
* 20:52 legoktm: upgraded mailman3 on lists1001
* 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2002.codfw.wmnet
* 20:24 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 06m 57s)
* 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
* 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 00m 05s)
* 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
* 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 17m 01s)
* 20:00 mforns@deploy1002: Started deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
* 19:55 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2002.codfw.wmnet
* 19:46 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 09m 45s)
* 19:37 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
* 19:33 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.5
* 19:29 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 00m 07s)
* 19:29 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
* 19:28 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 45m 45s)
* 18:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
* 18:53 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on testwiki - [[phab:T238138|T238138]] (duration: 01m 09s)
* 18:52 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
* 18:43 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
* 18:20 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.5 (duration: 09m 43s)
* 18:10 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
* 17:36 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 01m 25s)
* 17:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
* 17:35 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
* 17:34 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 02m 27s)
* 17:33 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
* 17:32 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
* 17:31 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev (duration: 01m 59s)
* 17:29 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev
* 17:20 mutante: the backend for people.wikimedia.org switched from people1002 to people1003, the people.wikimedia.org CNAME has been updated. MOTD is about to be updated to inform users.
* 17:18 legoktm: disabled pipermail redirects on lists.wikimedia.org
* 17:07 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:12 jynus: restarting bacula-dir on backup1001, stuck process
* 15:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
* 15:58 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog1001.eqiad.wmnet
* 15:55 bstorm: restart haproxy on dbproxy1018/9 to remove old config
* 15:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
* 15:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog2001.codfw.wmnet
* 15:37 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 15:36 dancy@deploy1002: sync-world aborted: testwikis wikis to 1.37.0-wmf.4 (duration: 02m 04s)
* 15:34 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
* 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:31 dancy@deploy1002: scap failed: RuntimeError scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details) (duration: 17m 36s)
* 15:31 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 15:27 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog2001.codfw.wmnet
* 15:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:13 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
* 15:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
* 14:57 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
* 14:49 moritzm: installing busybox security updates
* 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:27 moritzm: installing cgal security updates
* 14:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:14 hashar: Restarted CI Jenkins with a snapshot of the Gearman Jenkins plugin # [[phab:T281737|T281737]]
* 14:10 hashar: Restarted CI Jenkins for plugin upgrade # [[phab:T282433|T282433]]
* 14:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:01 hashar: Restarted releases Jenkins for plugin upgrade # [[phab:T282433|T282433]]
* 13:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1d4d00798bb24daa4e5b81b6c2ecda6143a6c6f0}}: enwiki: Growth features: Change help panel links ([[phab:T281896|T281896]]) (duration: 01m 02s)
* 13:39 jbond42: rolling restart of ats-backend
* 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1027.eqiad.wmnet
* 12:11 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1027.eqiad.wmnet
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15913 and previous config saved to /var/cache/conftool/dbconfig/20210511-114540-root.json
* 11:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15912 and previous config saved to /var/cache/conftool/dbconfig/20210511-113036-root.json
* 11:16 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:688178{{!}}Add P2671 and P4839 to deprecated properties list (T280779)]] (duration: 00m 58s)
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15911 and previous config saved to /var/cache/conftool/dbconfig/20210511-111532-root.json
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15910 and previous config saved to /var/cache/conftool/dbconfig/20210511-110029-root.json
* 10:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 10:46 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P15909 and previous config saved to /var/cache/conftool/dbconfig/20210511-102303-marostegui.json
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15908 and previous config saved to /var/cache/conftool/dbconfig/20210511-102212-root.json
* 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
* 10:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15907 and previous config saved to /var/cache/conftool/dbconfig/20210511-100708-root.json
* 09:54 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw2002-dev.codfw.wmnet
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15904 and previous config saved to /var/cache/conftool/dbconfig/20210511-095204-root.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15903 and previous config saved to /var/cache/conftool/dbconfig/20210511-093701-root.json
* 09:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
* 08:37 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 08:36 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 08:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 08:34 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 08:32 moritzm: installing hivex security updates
* 08:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 08:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15901 and previous config saved to /var/cache/conftool/dbconfig/20210511-082038-marostegui.json
* 08:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 07:55 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 07:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15899 and previous config saved to /var/cache/conftool/dbconfig/20210511-070742-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15898 and previous config saved to /var/cache/conftool/dbconfig/20210511-065238-root.json
* 06:50 marostegui: Stop replication on db2094:3318 [[phab:T282514|T282514]]
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15897 and previous config saved to /var/cache/conftool/dbconfig/20210511-063734-root.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15896 and previous config saved to /var/cache/conftool/dbconfig/20210511-062231-root.json
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1082.eqiad.wmnet
* 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1082.eqiad.wmnet
* 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
* 05:11 marostegui: Reimage db1121 to buster, this will generate lag on s4 (commonswiki) on wikireplicas [[phab:T280492|T280492]]
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 - going to be reimaged to buster [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15895 and previous config saved to /var/cache/conftool/dbconfig/20210511-051102-marostegui.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15894 and previous config saved to /var/cache/conftool/dbconfig/20210511-050816-marostegui.json
== 2021-05-10 ==
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|779fb53bfd7a4d9b11f865df14f8a72adb97f33b}}: Update messages used for tech CoC ([[phab:T280886|T280886]]) (duration: 00m 56s)
* 23:32 urbanecm@deploy1002: Synchronized wmf-config/extension-list: {{Gerrit|ba8b786c7f3a290f0747a6859fd07502eb83108f}}: NO-OP: Enable ChessBrowser on beta ([[phab:T244075|T244075]]) (duration: 00m 57s)
* 23:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|dd6fa6504350a90c9f14c218bc972558791f0a6d}}: Use ptwiki 20th anniversary logos ([[phab:T281925|T281925]]) (duration: 00m 59s)
* 23:08 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|f2a76b1a6eb55749395e67d74c74a7fc5df52f1b}}: Add ptwiki 20th anniversary logos ([[phab:T281925|T281925]]) (duration: 00m 58s)
* 22:28 eileen: civicrm revision changed from {{Gerrit|2052d79248}} to {{Gerrit|38ac15233f}}, config revision is {{Gerrit|47f21e4568}}
* 21:59 dancy@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/MediaSearch/MediaSearch.i18n.php: Backport: [[gerrit:688295{{!}}Manually include I18nUtils class (T282206)]] (duration: 00m 56s)
* 21:45 dancy@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/MediaSearch/MediaSearch.i18n.php: Backport: [[gerrit:688294{{!}}Manually include I18nUtils class (T282206)]] (duration: 01m 01s)
* 21:39 legoktm: nvm, downgraded flufl.bounce on lists1001
* 21:26 legoktm: upgraded flufl.bounce on lists1001 and restarted mailman3 [[phab:T282348|T282348]]
* 20:44 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: more deployment fixes (duration: 03m 44s)
* 20:41 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: more deployment fixes
* 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 02m 07s)
* 20:38 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:35 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 01m 55s)
* 20:33 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:31 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 01m 21s)
* 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:29 andrew@deploy1002: deploy aborted: update horizon to fix [[phab:T282489|T282489]] (duration: 00m 36s)
* 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:29 andrew@deploy1002: deploy aborted: update horizon to fix [[phab:T282489|T282489]] (duration: 00m 15s)
* 20:28 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:25 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 04m 10s)
* 20:21 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 18:34 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:632598{{!}}loginwiki: Allow users to mark Notifications as read (T264834)]] (duration: 00m 57s)
* 18:25 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:677325{{!}}Disable LocalisationUpdate, part I (T158360)]] (duration: 00m 58s)
* 18:24 XioNoX: add cmooney to all network devices
* 18:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679940{{!}}[wikitech] Enable VE desktop section edit links (T280291)]] (duration: 00m 57s)
* 18:13 jforrester@deploy1002: Synchronized wmf-config: Config: [[gerrit:657697{{!}}wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default (T269712)]] (duration: 00m 57s)
* 18:10 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:673306{{!}}FlaggedRevs: Stop setting wgFlaggedRevsWhitelist, now ignored]] (duration: 00m 57s)
* 18:08 legoktm: imported new mailman3, flufl.bounce packages to apt.wm.o
* 16:27 jbond42: rm -r /var/lib/routinator/repository and rebuilding repo
* 16:23 herron@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:688281{{!}}arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565)]] (duration: 00m 59s)
* 15:20 elukey: restart rsyslog on rpki1001
* 14:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15892 and previous config saved to /var/cache/conftool/dbconfig/20210510-131434-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15891 and previous config saved to /var/cache/conftool/dbconfig/20210510-125930-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15890 and previous config saved to /var/cache/conftool/dbconfig/20210510-124427-root.json
* 12:29 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15889 and previous config saved to /var/cache/conftool/dbconfig/20210510-122923-root.json
* 12:27 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
* 11:46 Urbanecm: EU B&C window done
* 11:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3418237fdbe3eaff409bb23bf97fbba51e60337a}}: Disabling Education Program namespaces in Russian Wikipedia ([[phab:T282112|T282112]]) (duration: 00m 57s)
* 11:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8bef11c3048683663e6edc38e21cd6d6d1192eb7}}: Add *.geograph.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T282007|T282007]]) (duration: 00m 57s)
* 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix # [[phab:T262155|T262155]]
* 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage # [[phab:T262155|T262155]]
* 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|068cd7e41e339acf72fb81d4fcc3b86292209fe3}}: Change namespace name and aliases on jawikivoyage ([[phab:T262155|T262155]]) (duration: 00m 57s)
* 11:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9209d96560777cf6747d57855c7b525e702664d7}}: Remove Vector language button from Commons, Wikidata, Mediawiki, Wikispecies ([[phab:T281968|T281968]]) (duration: 00m 57s)
* 11:20 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|7f6f8497cdfba6d766e3e6974ee15a492f0518ac}}: Add tmpSerializeEmptyListsAsObjects to Wikibase.php ([[phab:T241422|T241422]]) (duration: 01m 01s)
* 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6138c64e7c13fbc52ad084c0901bdd2ab30ad953}}: Add tmpSerializeEmptyListsAsObjects Wikibase repo config ([[phab:T241422|T241422]]) (duration: 00m 57s)
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|23271ddb555b44c2c9659c32907fdeff2a768916}}: Enable ReferencePreviews as full default on Marathi wiki ([[phab:T282147|T282147]]) (duration: 00m 57s)
* 11:09 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: {{Gerrit|bd28391f807d6205875cad0d049760c0e606de24}}: DatabaseBlockStore: fetch correct ActorNormalization (3/3; [[phab:T281972|T281972]]) (duration: 00m 56s)
* 11:08 urbanecm@deploy1002: sync-file aborted: {{Gerrit|bd28391f807d6205875cad0d049760c0e606de24}}: DatabaseBlockStore: fetch correct ActorNormalization ([[phab:T281972|T281972]]) (duration: 00m 04s)
* 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/ServiceWiring.php: {{Gerrit|85dc711dee753ad8302a431369d7814efb2785d1}}: DatabaseBlockStore: fetch correct ActorNormalization (2/3; [[phab:T281972|T281972]]) (duration: 00m 56s)
* 11:05 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: {{Gerrit|85dc711dee753ad8302a431369d7814efb2785d1}}: DatabaseBlockStore: fetch correct ActorNormalization (1/3; [[phab:T281972|T281972]]) (duration: 00m 57s)
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15888 and previous config saved to /var/cache/conftool/dbconfig/20210510-110125-marostegui.json
* 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15887 and previous config saved to /var/cache/conftool/dbconfig/20210510-104119-root.json
* 10:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:688214{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:688214{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 10:31 moritzm: installing openjdk-11 security updates
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15886 and previous config saved to /var/cache/conftool/dbconfig/20210510-102615-root.json
* 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 10:18 vgutierrez: rolling restart of ATS backend instances to clear spurious warnings
* 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
* 10:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
* 10:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15885 and previous config saved to /var/cache/conftool/dbconfig/20210510-101112-root.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15884 and previous config saved to /var/cache/conftool/dbconfig/20210510-095608-root.json
* 09:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqiad - [[phab:T281673|T281673]]
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 [[phab:T281959|T281959]]', diff saved to https://phabricator.wikimedia.org/P15883 and previous config saved to /var/cache/conftool/dbconfig/20210510-094554-marostegui.json
* 09:28 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 09:27 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 09:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
* 08:52 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
* 08:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@esams - [[phab:T281673|T281673]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156 for schema change', diff saved to https://phabricator.wikimedia.org/P15881 and previous config saved to /var/cache/conftool/dbconfig/20210510-084102-marostegui.json
* 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1001.eqiad.wmnet
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15880 and previous config saved to /var/cache/conftool/dbconfig/20210510-084040-root.json
* 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1001.eqiad.wmnet
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15879 and previous config saved to /var/cache/conftool/dbconfig/20210510-082536-root.json
* 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2001.codfw.wmnet
* 08:24 XioNoX: push pfw policies - [[phab:T282286|T282286]]
* 08:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2001.codfw.wmnet
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15878 and previous config saved to /var/cache/conftool/dbconfig/20210510-081033-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15877 and previous config saved to /var/cache/conftool/dbconfig/20210510-075529-root.json
* 07:38 hashar: Restarted CI Jenkins # [[phab:T281737|T281737]]
* 06:37 elukey: apt-get clean on rpki1001 to free some space
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P15876 and previous config saved to /var/cache/conftool/dbconfig/20210510-063254-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15875 and previous config saved to /var/cache/conftool/dbconfig/20210510-063121-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15874 and previous config saved to /var/cache/conftool/dbconfig/20210510-061617-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15873 and previous config saved to /var/cache/conftool/dbconfig/20210510-060113-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15872 and previous config saved to /var/cache/conftool/dbconfig/20210510-054610-root.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1082 from dbctl [[phab:T281794|T281794]]', diff saved to https://phabricator.wikimedia.org/P15871 and previous config saved to /var/cache/conftool/dbconfig/20210510-051334-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P15870 and previous config saved to /var/cache/conftool/dbconfig/20210510-050727-marostegui.json
== 2021-05-09 ==
* 21:44 legoktm: restarted mailman3 again ([[phab:T282348|T282348]]) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
* 18:28 legoktm: systemctl restart mailman3, bounce runner died again ([[phab:T282348|T282348]])
* 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: [[phab:T275605|T275605]]
* 10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: [[phab:T275605|T275605]]
* 09:16 legoktm: mailman3 live hacked patch at https://phabricator.wikimedia.org/T282348#7072358 to fix bounce queue
* 06:21 legoktm: restarting mailman3 service, bounce runner died
* 04:27 Amir1: starting upgrade of batch H of mailing lists ([[phab:T280322|T280322]])
== 2021-05-08 ==
* 17:18 Amir1: starting upgrade of batch G of mailing lists ([[phab:T280322|T280322]])
== 2021-05-07 ==
* 21:40 legoktm: deleted education@ from MM3, didn't import properly
* 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
* 21:33 legoktm: fixed owner for wdqs-gui-build list
* 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
* 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
* 18:23 brennen: 1.37.0-wmf.4 train status ([[phab:T281145|T281145]]): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
* 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: [[gerrit:685901{{!}}LinkBatch: skip bad input (T282180 T282070)]] (duration: 01m 06s)
* 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
* 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
* 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
* 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
* 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
* 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
* 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
* 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
* 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
* 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
* 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 13:04 Urbanecm: Start server-side upload for 1 video file ([[phab:T281927|T281927]])
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
* 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
* 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
* 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
* 09:55 dcausse: depooling wdqs1012 [[phab:T280382|T280382]], [[phab:T282222|T282222]]
* 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - [[phab:T281673|T281673]]
* 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - [[phab:T281673|T281673]]
* 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
* 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 10s)
* 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 06s)
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T282093|T282093]]', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
* 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json
== 2021-05-06 ==
== 2021-05-06 ==
* 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 ([[phab:T282193|T282193]])
* 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 ([[phab:T282092|T282092]])
* 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: [[gerrit:685890{{!}}Reorder tables in SpecialWatchlist (T282181)]] (duration: 00m 57s)
* 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 ([[phab:T282092|T282092]])
* 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o ([[phab:T282092|T282092]])
* 21:11 hashar: restarted CI Jenkins due to [[phab:T281737|T281737]]
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 19:04 ejegg: updated fundraising CiviCRM from {{Gerrit|8034e47008}} to {{Gerrit|2052d79248}}
* 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685906{{!}}Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140)]] (duration: 01m 04s)
* 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|338d1df5903cdc963b9eef22ec2c1750b7b3a02b}}: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 05s)
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|7e21cf0d96541d0ab5cb18cd7741756ab1dfe7b8}}: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 04s)
* 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - [[phab:T282140|T282140]] (duration: 01m 06s)
* 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
* 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
* 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
* 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:15 volans: upgrade spicerack on cumin* to 0.0.52
* 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 17:13 papaul: powerdown ms-be2057 for relocation
* 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 17:00 papaul: powerdown elastic2058 for relocation
* 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - [[phab:T281673|T281673]]
* 16:12 papaul: powerdown mc-gp2002 for relocation
* 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 15:58 Amir1: starting upgrade of public mailing lists in group d and e ([[phab:T280322|T280322]])
* 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 15:42 papaul: powerdown logstash2027 for relocation
* 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
* 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 15:26 ryankemper: [[phab:T280382|T280382]] [WDQS] Pooled `wdqs1007` and `wdqs2004`
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:14 papaul: powerdown ms-be2053 for relocation
* 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 14:55 papaul: powerdown kafka-main2002 for relocation
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
* 13:21 XioNoX: push pfw policies - [[phab:T281942|T281942]]
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
* 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
* 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 01m 06s)
* 11:34 mlitn@deploy1002: sync-file aborted: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 00m 56s)
* 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
* 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
* 11:12 kormat: reimaging db1173 to buster [[phab:T280751|T280751]]
* 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
* 10:19 jynus: stop dbprov2002 in advance of maintenance [[phab:T281135|T281135]]
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
* 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
* 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
* 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye [[phab:T275873|T275873]]
* 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
* 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
* 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
* 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
* 07:47 jynus: shutting down and removing db2098:s3 instance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
* 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
* 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - [[phab:T281673|T281673]]
* 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 07:24 moritzm: installing exim security updates on bullseye hosts
* 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
* 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
* 06:01 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 [[phab:T281445|T281445]]', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
* 05:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing [[phab:T282070|T282070]]  RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
* 05:27 effie: upgrade scap to 3.17.1-1 - [[phab:T279695|T279695]]
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
* 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": null,"_name": null}'`}}
* 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
* 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
* 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
* 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 00:35 Amir1: sudo service mailman3-web restart
* 00:35 Amir1: sudo service mailman3-web restart


Line 378: Line 3,019:
* 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt
* 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt
== 2021-04-30 ==
* 21:54 mutante: people1003 - rsycncing /home from peopel1002
* 15:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 15:29 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 15:25 bstorm: hard rebooting cloudmetrics1002 [[phab:T275605|T275605]]
* 11:40 ladsgroup@deploy1002: Synchronized static/favicon/wikitech.ico: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 11:36 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 11:34 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 11:33 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 11:31 ladsgroup@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 09:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 09:03 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 08:11 moritzm: remove mc1027 from debmonitor, server is broken and won't return ([[phab:T276415|T276415]])
* 07:38 moritzm: installing iputils updates from Buster point release
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15667 and previous config saved to /var/cache/conftool/dbconfig/20210430-061549-root.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15666 and previous config saved to /var/cache/conftool/dbconfig/20210430-060046-root.json
* 05:51 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15665 and previous config saved to /var/cache/conftool/dbconfig/20210430-054542-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15664 and previous config saved to /var/cache/conftool/dbconfig/20210430-053038-root.json
* 05:16 marostegui: Upgrade kernel on db1114
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15663 and previous config saved to /var/cache/conftool/dbconfig/20210430-051558-marostegui.json
* 05:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1080.eqiad.wmnet
* 04:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1080.eqiad.wmnet
* 04:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph`
* 04:43 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 04:42 ryankemper: [[phab:T261239|T261239]] `elastic2033`, which is known to be in a state of hardware failure (we have a ticket open), is holding up the reboot of codfw. I don't think we have a good way to exclude a node currently. Going to just proceed to `eqiad` for now
* 04:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 04:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 04:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 04:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 04:03 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 03:50 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1010.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:47 ryankemper: [[phab:T280563|T280563]] about half of codfw nodes have been rebooted before the failure caused by write queue not emptying fast enough, kicking it off again:`sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 03:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 01:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
== 2021-04-29 ==
* 23:36 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683749{{!}}Revert "DEMO: Add newline to README"]] (duration: 00m 56s)
* 23:18 ryankemper: [[phab:T280563|T280563]] successful reboot of `relforge100[3,4]`; `relforge` cluster is back to green status.
* 23:16 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683747{{!}}DEMO: Add newline to README]] (duration: 00m 56s)
* 23:08 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts` (amended command)
* 23:06 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 23:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:46 ryankemper: [[phab:T280563|T280563]] Current master is `relforge1003-relforge-eqiad`, will reboot `1004` first then `1003` after
* 22:44 ryankemper: [[phab:T280563|T280563]] Bleh, we never moved the new config into spicerack, so it's trying to talk to the old relforge hosts which no longer exist. Will reboot relforge manually and use the cookbook for codfw/eqiad, and circle back later for the spicerack change
* 22:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:32 ryankemper: [[phab:T280563|T280563]] Spotted the issue; forgot to set `--without-lvs` for relforge reboot
* 22:27 ryankemper: [[phab:T280563|T280563]] `urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fbe4bb8a518>: Failed to establish a new connection: [Errno -2] Name or service not known`
* 22:26 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:26 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 21:36 mutante: icinga - enabling disabled notifications for random an-worker nodes where mgmt interface had enabled alerts but the actual host didnt
* 21:32 mutante: icinga - enabled notifications for checks on ms-backup1001 - they were all manually disabled but none of the checks had any status change since 50 days which indicates it was forgotten to turn them back on which is a common issue with disabling notifications
* 21:16 mutante: backup1001 - sudo check_bacula.py --icinga
* 20:54 marostegui: Stop mysql on tendril for the UTC night, dbtree and tendrill will remain down for a few hours [[phab:T281486|T281486]]
* 20:16 marostegui: Restart tendril database - [[phab:T281486|T281486]]
* 20:00 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 19:46 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 08s)
* 19:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 19:32 dpifke@deploy1002: Finished deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484 (duration: 00m 05s)
* 19:32 dpifke@deploy1002: Started deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484
* 19:01 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki/wanobjectcache/revision_row_1/ (bad data from Sep 2019)
* 18:59 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/rl-minify-* (bad data from Aug 2018)
* 18:58 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki_ExternalGuidance_init_Google_tr_fr (bad data from Nov 2019)
* 18:38 krinkle@deploy1002: Synchronized php-1.37.0-wmf.1/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 08s)
* 18:33 mutante: LDAP - added mmandere to wmf group ([[phab:T281344|T281344]])
* 18:10 krinkle@deploy1002: Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 09s)
* 17:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:29 ryankemper: [[phab:T281498|T281498]] `sudo -E cumin 'C:role::lvs::balancer' 'sudo run-puppet-agent'`
* 16:28 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 16:27 liw@deploy1002: sync-wikiversions aborted: Revert "group[0{{!}}1] wikis to [VERSION]" (duration: 00m 01s)
* 16:22 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo depool`
* 16:20 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo run-puppet-agent`
* 16:18 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 39s)
* 16:15 otto@deploy1002: Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 16:12 papaul: powerdown thanos-fe2001 for memory swap
* 15:44 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here)
* 15:43 ryankemper: [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up
* 15:37 ryankemper: [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001`
* 15:35 ryankemper: [WDQS] ^ scratch that, depooled `wdqs2001`
* 15:34 ryankemper: [WDQS] pooled `wdqs2001`
* 14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 13:44 moritzm: installing Java security updates on stat* hosts
* 13:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:40 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 59s)
* 13:37 otto@deploy1002: Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 13:11 moritzm: installing postgresql-11 security updates
* 13:08 jbond42: merge netbase change to manage /etc/services
* 13:07 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:06 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 12:36 Amir1: upgrading Quiddity to admin in mailman3
* 12:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:26 moritzm: installing grub2 updates from buster point release
* 12:06 jbond42: update debmonitor.discover.wmnet ssl cert
* 11:59 ladsgroup@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:683454{{!}}Undeploy JADE from production, Part III (T281418)]] (duration: 01m 07s)
* 11:54 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683453{{!}}Undeploy JADE from production, Part II (T281418)]], Part I (duration: 01m 06s)
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683452{{!}}Undeploy JADE from production, Part I (T281418)]] (duration: 01m 07s)
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 11:38 mbsantos@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683548{{!}}Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857)]] (duration: 01m 07s)
* 11:34 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683534{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 07s)
* 11:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683533{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 08s)
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15658 and previous config saved to /var/cache/conftool/dbconfig/20210429-113211-root.json
* 11:24 mbsantos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683547{{!}}Enable suggested values in TemplateData and VisualEditor InitialiseSettings (T273857)]] (duration: 01m 07s)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15657 and previous config saved to /var/cache/conftool/dbconfig/20210429-111708-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15656 and previous config saved to /var/cache/conftool/dbconfig/20210429-110204-root.json
* 10:59 moritzm: updating apt on buster (SUA 198), which eases bullseye upgrades [[phab:T275873|T275873]]
* 10:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683135{{!}}Fix CX token cookie (T281346)]] (duration: 01m 08s)
* 10:54 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683134{{!}}Fix CX token cookie (T281346)]] (duration: 01m 09s)
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15655 and previous config saved to /var/cache/conftool/dbconfig/20210429-104700-root.json
* 10:27 marostegui: Upgrade kernel on db1110
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15654 and previous config saved to /var/cache/conftool/dbconfig/20210429-102447-marostegui.json
* 09:42 volans: uploaded pynetbox 5.3.0-2 to bullseye-wikimedia on qpt.w.o
* 09:39 volans@deploy1002: Finished deploy [homer/deploy@e394769]: Release v0.2.8 (duration: 03m 30s)
* 09:35 volans@deploy1002: Started deploy [homer/deploy@e394769]: Release v0.2.8
* 09:01 jynus: stop replication and checking data of db2100:s7
* 08:57 marostegui: Upgrade kernel on db2133
* 08:51 marostegui: Upgrade kernel on db2125
* 08:50 marostegui: Upgrade kernel on db2124
* 08:46 marostegui: Upgrade kernel on db2122
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 100%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15652 and previous config saved to /var/cache/conftool/dbconfig/20210429-084011-root.json
* 08:39 marostegui: Upgrade kernel on db2121
* 08:33 marostegui: Upgrade kernel on db2120
* 08:28 volans@deploy1002: Finished deploy [homer/deploy@89cd07c]: Release v0.2.7 (duration: 03m 08s)
* 08:27 marostegui: Upgrade kernel on db2115
* 08:25 volans@deploy1002: Started deploy [homer/deploy@89cd07c]: Release v0.2.7
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 80%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15651 and previous config saved to /var/cache/conftool/dbconfig/20210429-082507-root.json
* 08:19 marostegui: Upgrade kernel on db2114
* 08:12 marostegui: Upgrade kernel on db2109
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 70%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15649 and previous config saved to /var/cache/conftool/dbconfig/20210429-081004-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 60%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15648 and previous config saved to /var/cache/conftool/dbconfig/20210429-075500-root.json
* 07:54 marostegui: Upgrade kernel on db2089
* 07:48 jynus: rolling restart of bacula hosts [[phab:T273182|T273182]]
* 07:48 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 01m 07s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15647 and previous config saved to /var/cache/conftool/dbconfig/20210429-074625-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 50%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15646 and previous config saved to /var/cache/conftool/dbconfig/20210429-073956-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 90%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15645 and previous config saved to /var/cache/conftool/dbconfig/20210429-073122-root.json
* 07:28 marostegui: Stop mysql and upgrade kernel on pc1007
* 07:28 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 01m 08s)
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 40%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15644 and previous config saved to /var/cache/conftool/dbconfig/20210429-072453-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 80%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15643 and previous config saved to /var/cache/conftool/dbconfig/20210429-071618-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 25%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15642 and previous config saved to /var/cache/conftool/dbconfig/20210429-070949-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15641 and previous config saved to /var/cache/conftool/dbconfig/20210429-070114-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 10%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15640 and previous config saved to /var/cache/conftool/dbconfig/20210429-065445-root.json
* 06:53 godog: add 100G to prometheus/ops in eqiad
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 60%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15639 and previous config saved to /var/cache/conftool/dbconfig/20210429-064611-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15637 and previous config saved to /var/cache/conftool/dbconfig/20210429-063107-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 40%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15636 and previous config saved to /var/cache/conftool/dbconfig/20210429-061603-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 30%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15635 and previous config saved to /var/cache/conftool/dbconfig/20210429-060100-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15634 and previous config saved to /var/cache/conftool/dbconfig/20210429-054556-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 20%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15633 and previous config saved to /var/cache/conftool/dbconfig/20210429-053052-root.json
* 05:22 marostegui: Check tables on db1121 (this will cause lag on s4 commonswiki, on wikireplicas)
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for tables checking', diff saved to https://phabricator.wikimedia.org/P15632 and previous config saved to /var/cache/conftool/dbconfig/20210429-052146-marostegui.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 15%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15631 and previous config saved to /var/cache/conftool/dbconfig/20210429-051549-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15630 and previous config saved to /var/cache/conftool/dbconfig/20210429-050045-root.json
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15629 and previous config saved to /var/cache/conftool/dbconfig/20210429-045557-marostegui.json
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15627 and previous config saved to /var/cache/conftool/dbconfig/20210429-045015-marostegui.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15626 and previous config saved to /var/cache/conftool/dbconfig/20210429-044458-marostegui.json
* 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15625 and previous config saved to /var/cache/conftool/dbconfig/20210429-043857-marostegui.json
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1156 to dbctl [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15624 and previous config saved to /var/cache/conftool/dbconfig/20210429-043812-marostegui.json
* 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for reimage', diff saved to https://phabricator.wikimedia.org/P15623 and previous config saved to /var/cache/conftool/dbconfig/20210429-042757-marostegui.json
* 02:59 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job (duration: 00m 06s)
* 02:59 milimetric@deploy1002: Started deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job
* 02:58 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b]: Hotfix for referrer job (duration: 14m 40s)
* 02:44 milimetric@deploy1002: Started deploy [analytics/refinery@740226b]: Hotfix for referrer job
* 01:44 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|I5869b3c3ba4a}} (duration: 01m 08s)
* 01:23 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper: [[phab:T280382|T280382]] Aborted data transfer; `wdqs2007` is hosed (see https://phabricator.wikimedia.org/T281437)
* 01:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:40 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: [[phab:T281405|T281405]] (duration: 01m 08s)
* 00:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 00:06 ryankemper: [[phab:T280382|T280382]] `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
== 2021-04-28 ==
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 23:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 23:06 dpifke@deploy1002: Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)
* 23:06 dpifke@deploy1002: Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886
* 22:44 dwisehaupt: civiproxy revision changed to {{Gerrit|99cecb924a}} - initial rollout of code for testing
* 22:26 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 22:26 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 22:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:44 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 21:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 21:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:39 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 21:37 ryankemper: [[phab:T280382|T280382]] `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer
* 21:32 ryankemper: [[phab:T280382|T280382]] [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`
* 21:24 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)
* 21:07 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 21:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 21:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 20:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:57 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:13 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 07s)
* 19:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 18:21 legoktm: added mvolz as listadmin for services@ and reset admin pw ([[phab:T278516|T278516]])
* 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: {{Gerrit|b392dba0d77904d7de819043e51d8c3fbf003873}}: Fix incorrect ItemId typehint in Lua bindings ([[phab:T281361|T281361]]) (duration: 01m 09s)
* 16:52 papaul: powerdown logstash2034 for relocation
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 16:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 16:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 16:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 16:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 16:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:20 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts conf[2001-2003].codfw.wmnet
* 15:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:03 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:00 moritzm: imported python-poolcounter 0.0.2-1+deb11u1 to apt.wikimedia.org [[phab:T275873|T275873]]
* 14:53 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[2001-2003].codfw.wmnet
* 14:44 moritzm: imported gitlab-ce 13.9.7-ce.0 to apt.wikimedia.org
* 14:40 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d] (duration: 04m 59s)
* 14:35 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d] (duration: 00m 06s)
* 14:34 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 03m 07s)
* 14:32 moritzm: installing iproute2 updates from buster point release
* 14:31 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 14:30 milimetric@deploy1002: deploy aborted: - (duration: 00m 00s)
* 14:30 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: -
* 14:30 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 12m 31s)
* 14:26 moritzm: installing net-snmp updates from buster point release
* 14:17 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:15 jayme: restarting pybal on lvs5001,lvs4005,lvs2007 - [[phab:T271573|T271573]]
* 13:14 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 3.17.0-wmf.1"
* 13:10 jayme: restarting pybal on lvs5002,lvs4006,lvs2008 - [[phab:T271573|T271573]]
* 13:04 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 13:03 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 13:02 moritzm: upgrading deployment servers to PHP 7.4.32
* 12:55 moritzm: upgrading snapshot hosts to PHP 7.4.32
* 12:48 jayme: restarting pybal on lvs2009 - [[phab:T271573|T271573]]
* 12:45 moritzm: upgrading labweb to PHP 7.4.32
* 12:43 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:42 jayme: restarting pybal on lvs5003,lvs4007 - [[phab:T271573|T271573]]
* 12:39 jayme: restarting pybal on lvs2010 - [[phab:T271573|T271573]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:28 apergos: manually edited /srv/deployment/dumps/dumps-cache/config on snapshots1011,12,13 to change deploy1001 to deploy1002 (where did it get the old value from? these are new installs!)
* 12:16 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:15 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 11:53 jayme: switching SRV record _etcd._tcp to new etcd cluster (for codfw, eqsin, ulsfo)
* 11:22 Urbanecm: EU B&C window done
* 11:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: {{Gerrit|8d0ae5e8fedefa911fc216bfc810d7a6169ea7e5}}: Separate reference preview settings in beta & non-beta ([[phab:T281235|T281235]]) (duration: 01m 08s)
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddbc378e41783356e28cd90bbefa08624ea2844c}}: Enable partial action blocks on testwiki ([[phab:T280528|T280528]]) (duration: 01m 07s)
* 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 10:44 jbond42: updated the check-raid nrpe script to python3
* 09:40 moritzm: restarting Tomcat on idp-test1001 to pick up Java security updates
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15618 and previous config saved to /var/cache/conftool/dbconfig/20210428-092103-root.json
* 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1001.wikimedia.org
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint1001.wikimedia.org
* 09:09 moritzm: restarting jenkins* on releases to pick up Java security updates
* 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15617 and previous config saved to /var/cache/conftool/dbconfig/20210428-090559-root.json
* 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15616 and previous config saved to /var/cache/conftool/dbconfig/20210428-085056-root.json
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 01m 08s)
* 08:41 urbanecm@deploy1002: sync-file aborted: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 00m 02s)
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15615 and previous config saved to /var/cache/conftool/dbconfig/20210428-083625-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15614 and previous config saved to /var/cache/conftool/dbconfig/20210428-083552-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15613 and previous config saved to /var/cache/conftool/dbconfig/20210428-083458-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15612 and previous config saved to /var/cache/conftool/dbconfig/20210428-082625-root.json
* 08:25 effie: update php7.2 on jobrunners and parsoid servers && rolling  php7.2-fpm restarts
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15611 and previous config saved to /var/cache/conftool/dbconfig/20210428-081121-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15610 and previous config saved to /var/cache/conftool/dbconfig/20210428-075618-root.json
* 07:52 effie: update php7.2 on api servers && rolling  php7.2-fpm restarts
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15609 and previous config saved to /var/cache/conftool/dbconfig/20210428-074114-root.json
* 07:40 marostegui: Deploy schema change on db1098:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 07:27 effie: update php7.2 on appservers && rolling  php7.2-fpm restarts
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for schema change and kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15608 and previous config saved to /var/cache/conftool/dbconfig/20210428-072609-marostegui.json
* 07:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:14 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:12 elukey: add AAAA record for kafka-main200[3,4,5].codfw.wmnet
* 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:05 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:04 elukey: add AAAA record for kafka-main2002.codfw.wmnet
* 07:03 marostegui: Deploy schema change on db2089:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:26 legoktm: created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm
* 06:23 legoktm: legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json
* 06:00 marostegui: Stop MySQL on db2096 (x1 codfw) [[phab:T281135|T281135]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1167 in s8 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json
* 05:00 marostegui: Starting s1 eqiad failover from db1083 to db1163 - [[phab:T278214|T278214]]
* 04:14 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 04:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:08 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 04:08 marostegui: Start replication changes, connect everything to db1163 [[phab:T278214|T278214]]
* 04:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json
* 03:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:49 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
* 03:48 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet
* 03:33 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 02:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:28 robh@cumin1001: START - Cookbook sre.dns.netbox
* 01:06 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
== 2021-04-27 ==
* 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 23:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
* 23:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 23:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 23:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 21:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet
* 20:55 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet
* 20:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet
* 20:42 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet
* 20:32 bblack: re-pooling codfw public traffic - [[phab:T279457|T279457]]
* 20:11 jhuneidi@deploy1002: Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment ([[phab:T281226|T281226]]) to 1.37.0-wmf.3 (duration: 01m 12s)
* 20:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 20:06 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 19:44 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 19:35 papaul: powerdown ms-backup2001  for maintenance
* 19:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 19:07 papaul: powerdown logstash2035  for maintenance
* 19:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 19:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
* 18:50 mutante: people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable
* 18:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
* 18:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 18:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 18:33 mutante: people1003 - rebooting, trying to get new VM to work
* 18:33 Urbanecm: Morning B&C window done
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|91a85f2}}: {{Gerrit|ac770bf}}: Enable language in header for office and testwiki users ([[phab:T280526|T280526]]) (duration: 01m 19s)
* 18:32 bblack: lvs2009 - restart pybal + re-run puppet agent - [[phab:T279457|T279457]]
* 18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:20 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[56].codfw.wmnet
* 18:20 bblack: cp203[56] - repooling in etcd - [[phab:T279457|T279457]]
* 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:17 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:17 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:16 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:12 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:11 bblack: dns2001 - restarting bird to repool, then re-enabling puppet - [[phab:T279457|T279457]]
* 18:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:02 ejegg: update payments-wiki from {{Gerrit|9a4eef1375}} to {{Gerrit|44570561f2}}
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 17:58 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 17:34 papaul: powerdown moss-fe2001  for maintenance
* 17:32 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:25 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:23 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:21 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:19 ryankemper: [[phab:T281215|T281215]] Banned `elastic2043` from codfw cirrussearch cluster
* 17:16 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:14 papaul: powerdown kafka-logging2003  for maintenance
* 17:14 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:09 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:52 papaul: powerdown elastic2045  for maintenance
* 16:49 papaul: powerdown ms-be2042 for maintenance
* 16:39 dcaro: reprepro updating packages on thirdparty/ceph-nautilus-buster
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 39 hosts with reason: upgrading openstack
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 39 hosts with reason: upgrading openstack
* 16:22 effie: upgrading scap 3.17.1-1 on mediawiki canaries - [[phab:T279695|T279695]]
* 16:18 effie: uploading scap_3.17.1-1
* 16:18 effie: uploading cap_3.17.1-1
* 15:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1026.eqiad.wmnet
* 14:48 moritzm: installing file/libmagic updates from buster point release
* 14:47 bblack: lvs2009 - disable puppet + stop pybal (internal services will move to lvs2010, please avoid LVS service definition changes for now!) - [[phab:T279457|T279457]]
* 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
* 14:36 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[56].codfw.wmnet
* 14:36 bblack: cp203[56] - depool all etcd services via confctl - [[phab:T279457|T279457]]
* 14:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
* 14:33 bblack: dns2001 - depooling for [[phab:T279457|T279457]] (disable puppet + stop bird)
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
* 14:31 moritzm: installing imagemagick security updates
* 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
* 14:20 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:19 moritzm: installing xen security updates
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 14:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 14:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 14:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 13:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 13:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 13:55 moritzm: imported jenkins 2.277.3 to thirdparty/ci
* 13:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 13:48 moritzm: uploaded openjdk-8 8u292-b10-0~deb10u1 (buster forward port of latest Java 8 security release)
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris: switchover api-gateway, changeprop, cpjobqueue to use the new redis cluster servers (rdb2007-rdb2010)
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:30 hashar: Upgrading CI Jenkins from 2.263.3 to 2.277.2
* 13:23 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1020-1026].eqiad.wmnet
* 13:19 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:13 liw@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.3
* 13:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/includes/Config/WikiPageConfigValidation.php: {{Gerrit|fe2a0420fd884df7046c0c283bcb2e961e74e8e9}}: WikiPageConfigValidation: Mentor lists and help desk can be null ([[phab:T281229|T281229]]) (duration: 01m 06s)
* 13:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:06 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1020-1026].eqiad.wmnet
* 13:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be1019.eqiad.wmnet
* 12:55 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1019.eqiad.wmnet
* 12:46 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:682815{{!}}Revert "URGENT: Disable GlobalUsage" (T281242)]] (duration: 01m 08s)
* 12:44 hashar: Restarted CI Jenkins for plugins upgrade
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15592 and previous config saved to /var/cache/conftool/dbconfig/20210427-122619-root.json
* 12:20 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GlobalUsage: Backport: [[gerrit:682814{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:12 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GlobalUsage: Backport: [[gerrit:682813{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15591 and previous config saved to /var/cache/conftool/dbconfig/20210427-121115-root.json
* 12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 12:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15590 and previous config saved to /var/cache/conftool/dbconfig/20210427-115612-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15589 and previous config saved to /var/cache/conftool/dbconfig/20210427-114108-root.json
* 11:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 11:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove RW from commonswiki', diff saved to https://phabricator.wikimedia.org/P15588 and previous config saved to /var/cache/conftool/dbconfig/20210427-111016-marostegui.json
* 11:09 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Disable GlobalUsage (duration: 01m 08s)
* 10:40 volans@cumin1001: dbctl commit (dc=all): 'S4 RO, outage', diff saved to https://phabricator.wikimedia.org/P15585 and previous config saved to /var/cache/conftool/dbconfig/20210427-104057-volans.json
* 10:18 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 10:06 XioNoX: standardize management routers ACLs with Capirca - mr1-eqiad (last one)
* 10:01 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 02m 16s)
* 09:59 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:56 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 22s)
* 09:56 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15584 and previous config saved to /var/cache/conftool/dbconfig/20210427-093536-marostegui.json
* 09:35 XioNoX: standardize management routers ACLs with Capirca - mr1-eqsin
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15583 and previous config saved to /var/cache/conftool/dbconfig/20210427-093501-root.json
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 09:33 moritzm: rolling restart of elastic in relforge* to pick up Java updates
* 09:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15582 and previous config saved to /var/cache/conftool/dbconfig/20210427-091957-root.json
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 09:17 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 09:16 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 09:11 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:11 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:06 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15581 and previous config saved to /var/cache/conftool/dbconfig/20210427-090454-root.json
* 09:04 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:01 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15580 and previous config saved to /var/cache/conftool/dbconfig/20210427-084950-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15579 and previous config saved to /var/cache/conftool/dbconfig/20210427-084651-marostegui.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15578 and previous config saved to /var/cache/conftool/dbconfig/20210427-084630-root.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114 into main and api', diff saved to https://phabricator.wikimedia.org/P15577 and previous config saved to /var/cache/conftool/dbconfig/20210427-083910-marostegui.json
* 08:36 XioNoX: standardize management routers ACLs with Capirca
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15576 and previous config saved to /var/cache/conftool/dbconfig/20210427-083145-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15575 and previous config saved to /var/cache/conftool/dbconfig/20210427-083126-root.json
* 08:24 hashar: Restarting CI Jenkins for plugins upgrade
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15574 and previous config saved to /var/cache/conftool/dbconfig/20210427-081911-marostegui.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15573 and previous config saved to /var/cache/conftool/dbconfig/20210427-081846-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15572 and previous config saved to /var/cache/conftool/dbconfig/20210427-081623-root.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15571 and previous config saved to /var/cache/conftool/dbconfig/20210427-081325-root.json
* 08:12 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:11 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:10 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:08 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:03 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 90%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15570 and previous config saved to /var/cache/conftool/dbconfig/20210427-080342-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15569 and previous config saved to /var/cache/conftool/dbconfig/20210427-080119-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15568 and previous config saved to /var/cache/conftool/dbconfig/20210427-075822-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15567 and previous config saved to /var/cache/conftool/dbconfig/20210427-075759-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15566 and previous config saved to /var/cache/conftool/dbconfig/20210427-075738-root.json
* 07:52 liw@deploy1002: Pruned MediaWiki: 1.36.0-wmf.38 (duration: 03m 17s)
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 80%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15565 and previous config saved to /var/cache/conftool/dbconfig/20210427-074839-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15564 and previous config saved to /var/cache/conftool/dbconfig/20210427-074318-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15563 and previous config saved to /var/cache/conftool/dbconfig/20210427-074234-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15562 and previous config saved to /var/cache/conftool/dbconfig/20210427-073335-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15561 and previous config saved to /var/cache/conftool/dbconfig/20210427-072814-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15560 and previous config saved to /var/cache/conftool/dbconfig/20210427-072731-root.json
* 07:26 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 07:24 liw@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.3 (duration: 30m 54s)
* 07:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 60%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15559 and previous config saved to /var/cache/conftool/dbconfig/20210427-071831-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15558 and previous config saved to /var/cache/conftool/dbconfig/20210427-071227-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15557 and previous config saved to /var/cache/conftool/dbconfig/20210427-070328-root.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for schema change', diff saved to https://phabricator.wikimedia.org/P15556 and previous config saved to /var/cache/conftool/dbconfig/20210427-065628-marostegui.json
* 06:55 elukey: upgrade mariadb to 10.4.18-1 + reboot on db1108 - [[phab:T279281|T279281]]
* 06:54 liw@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.3
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 40%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15555 and previous config saved to /var/cache/conftool/dbconfig/20210427-064824-root.json
* 06:37 liw: version 1.37.0-wmf.3 was branched at {{Gerrit|20ab303fd1d883592b4d2ec2468dfaccad7a9e10}} for [[phab:T278347|T278347]]
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 30%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15554 and previous config saved to /var/cache/conftool/dbconfig/20210427-063320-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15553 and previous config saved to /var/cache/conftool/dbconfig/20210427-061817-root.json
* 06:11 elukey: powercycle elastic2043 - no ssh, no tty remote console available
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 20%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15552 and previous config saved to /var/cache/conftool/dbconfig/20210427-060313-root.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 15%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15551 and previous config saved to /var/cache/conftool/dbconfig/20210427-054809-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 10%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15550 and previous config saved to /var/cache/conftool/dbconfig/20210427-053306-root.json
* 05:30 XioNoX: push pfw fw policies - [[phab:T281137|T281137]]
* 05:27 legoktm: imported hyperkitty_1.3.4-2~bpo10+2 to apt.wm.o ([[phab:T281213|T281213]])
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15549 and previous config saved to /var/cache/conftool/dbconfig/20210427-052236-root.json
* 05:21 marostegui: Stop mysql on db1087 to clone db1167 (lag will appear on wikidata on wikireplicas) [[phab:T258361|T258361]]
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 temporarily as db1087 will be depooled', diff saved to https://phabricator.wikimedia.org/P15547 and previous config saved to /var/cache/conftool/dbconfig/20210427-052026-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 5%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15546 and previous config saved to /var/cache/conftool/dbconfig/20210427-051802-root.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15545 and previous config saved to /var/cache/conftool/dbconfig/20210427-050826-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15544 and previous config saved to /var/cache/conftool/dbconfig/20210427-050732-root.json
* 05:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1077.eqiad.wmnet
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1077.eqiad.wmnet
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15543 and previous config saved to /var/cache/conftool/dbconfig/20210427-045229-root.json
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15541 and previous config saved to /var/cache/conftool/dbconfig/20210427-044609-marostegui.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15540 and previous config saved to /var/cache/conftool/dbconfig/20210427-044520-marostegui.json
* 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15539 and previous config saved to /var/cache/conftool/dbconfig/20210427-043725-root.json
* 04:25 legoktm: upgrading lists-next.wikimedia.org to mailman3-from-bullseye ([[phab:T280887|T280887]])
* 04:19 marostegui: Set phabricator on read only [[phab:T279625|T279625]]
* 03:37 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 03:36 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s)
* 03:28 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet
* 03:28 ryankemper@deploy1002: Started deploy [wdqs/wdqs@08ad17a]: 0.3.70
* 03:27 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003`
* 03:17 ryankemper: [[phab:T280382|T280382]] `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 02:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:29 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id [[phab:T280382|T280382]]` on `ryankemper@cumin1001` tmux session `reimage`
* 01:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:21 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
== 2021-04-26 ==
* 23:28 mutante: renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts
* 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 22:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 22:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 20:48 twentyafterfour: restarting php-fpm on phab1001 to deploy phabricator hotfix {{Gerrit|d238db85b8d8072d99f31805aa4a8a7cf0c09941}}
* 20:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 20:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet
* 19:45 legoktm: uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o
* 18:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 18:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 18:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 18:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 18:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2d16f6251a67cf13cef02bbdcb3c9f5c1c505d16}}: elwiki: Update Growth experiments configuration ([[phab:T280172|T280172]]) (duration: 00m 58s)
* 18:06 urbanecm@deploy1002: Synchronized multiversion/MWScript.php: {{Gerrit|5ace4e1b806bcfc4ea059f9e9cae9aa94c0bdbd1}}: Fix error message if MWScript.php is run without arguments (duration: 00m 58s)
* 17:28 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 17:26 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 17:18 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 17:06 legoktm: imported postorius_1.3.4-2~bpo10+2 to apt.wm.o
* 16:49 mutante: gerrit - restarted apache (hard) to remove time out from gerrit:682502
* 16:40 mutante: gerrit1001 - reload apache2
* 16:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1025.eqiad.wmnet
* 16:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1025.eqiad.wmnet
* 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 15:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 15:21 elukey: restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter
* 15:06 moritzm: installing jquery security updates on stretch
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:48 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:47 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:28 moritzm: installing ldap-replica1003/1004
* 14:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 14:03 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15537 and previous config saved to /var/cache/conftool/dbconfig/20210426-133922-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15536 and previous config saved to /var/cache/conftool/dbconfig/20210426-133905-root.json
* 13:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 13:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15535 and previous config saved to /var/cache/conftool/dbconfig/20210426-132533-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15534 and previous config saved to /var/cache/conftool/dbconfig/20210426-132417-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15533 and previous config saved to /var/cache/conftool/dbconfig/20210426-132402-root.json
* 13:14 moritzm: installing ldap-replica2005/2006
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15532 and previous config saved to /var/cache/conftool/dbconfig/20210426-131029-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15531 and previous config saved to /var/cache/conftool/dbconfig/20210426-130913-root.json
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15530 and previous config saved to /var/cache/conftool/dbconfig/20210426-130858-root.json
* 12:57 moritzm: installing gst-plugins-base1.0 security updates
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15529 and previous config saved to /var/cache/conftool/dbconfig/20210426-125526-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15528 and previous config saved to /var/cache/conftool/dbconfig/20210426-125409-root.json
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15527 and previous config saved to /var/cache/conftool/dbconfig/20210426-125354-root.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15526 and previous config saved to /var/cache/conftool/dbconfig/20210426-124141-marostegui.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15525 and previous config saved to /var/cache/conftool/dbconfig/20210426-124022-root.json
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15524 and previous config saved to /var/cache/conftool/dbconfig/20210426-123020-marostegui.json
* 12:28 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 12:27 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 12:24 Amir1: cleaning watchlist of QuickStatementsBot in wikidatawiki
* 12:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 12:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 12:00 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Enable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 11:57 marostegui: Restart es4 primary master - [[phab:T279281|T279281]]
* 11:55 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Disable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 11:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:49 hashar@deploy1002: Finished deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki (duration: 00m 13s)
* 11:49 hashar@deploy1002: Started deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki
* 11:46 Urbanecm: EU B&C done
* 11:45 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/Dialog.js: {{Gerrit|a347517f906b07b2503ae559c6cc714e1c50e4aa}}: Fix suggested values not being shown when the params type isnt specified ([[phab:T280688|T280688]]) (duration: 00m 57s)
* 11:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:681137{{!}}Revert "Set wgPageImagesAPIDefaultLicense to 'any' for wikidata"]] (duration: 00m 57s)
* 11:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
* 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2b5b640ad28bce1df20c2ca82654996d9cfc7630}}: Enable ContentTranslation as a default tool for 11 Wikipedias ([[phab:T279422|T279422]]) (duration: 00m 57s)
* 10:58 effie: restarting php-fpm in mw* clusters in codfw to pick up php7.2 update
* 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:38 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1004.wikimedia.org
* 10:37 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Setup wmgUseFooterCodeOfConductLink for later usage (duration: 00m 57s)
* 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 10:26 effie: upgrading mw* servers  php7.2  in codfw
* 10:25 marostegui: Deploy schema change on s4 codfw, lag will appear [[phab:T276292|T276292]]
* 10:24 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgUseFooterTechCodeOfConductLink instead of wmgUseFooterCodeOfConductLink (duration: 00m 57s)
* 10:24 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1004.wikimedia.org
* 10:22 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseFooterTechCodeOfConductLink (duration: 00m 59s)
* 10:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1003.wikimedia.org
* 10:18 moritzm: installing systemd updates from buster 10.9 point release
* 10:07 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1003.wikimedia.org
* 10:00 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2006.wikimedia.org
* 09:42 moritzm: installing clamav security updates on otrs1001
* 09:38 godog: reboot ms-be1062, kernel backtrace saved
* 09:26 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 09:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2006.wikimedia.org
* 09:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2005.wikimedia.org
* 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 09:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 09:13 jayme: imported etcd-mirror_0.0.6-2 to buster-wikimedia
* 09:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005.wikimedia.org
* 09:07 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2005failoid1002.wikimedia.org
* 09:04 jayme: imported etcd-mirror_0.0.6-1 to buster-wikimedia
* 08:55 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005failoid1002.wikimedia.org
* 08:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|f01a6dab70f74938dd51668809a181a8f551b6c8}}: GrowthExperiments: Enable community configuration on testwiki ([[phab:T274520|T274520]]) (duration: 00m 57s)
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|88da8226823e59d1d19db9aeca3b5a5140c0c60c}}: GrowthExperiments: Do not enable community configuration outside of beta wikis ([[phab:T274520|T274520]]) (duration: 00m 59s)
* 08:28 moritzm: update debmonitor to 0.2.9 on remaining hosts [[phab:T281090|T281090]]
* 08:13 moritzm: installing lxml security updates on stretch
* 07:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 07:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 07:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 07:32 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 07:24 moritzm: installing pear security updates
* 07:09 moritzm: removed rawdog from bullseye-wikimedia, needs Py2 [[phab:T280989|T280989]]
* 06:24 elukey: reboot an-coord1001 to pick up kernel security settings (after reimage)
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json
* 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 03:43 kart_: Updated cxserver to 2021-04-21-044024-production ([[phab:T279045|T279045]])
* 03:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:37 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:32 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
== 2021-04-25 ==
* 15:23 Amir1: sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) ([[phab:T281066|T281066]])
== 2021-04-24 ==
* 22:24 bstorm: Rebooting labstore1007 from ilo after crash
== 2021-04-23 ==
* 21:36 foks: removing 1 file for legal compliance
* 20:15 mutante: [apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb ([[phab:T280989|T280989]])
* 19:41 mutante: [apt1001:~] $ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy - copy envoy package from buster to bullseye [[phab:T280989|T280989]]
* 19:09 ebernhardson: closing duplicate/wrong cluster indices in cloudelastic
* 17:02 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 14:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 14:25 moritzm: revert back bullseye image to daily build from last week (to rule out potential reimage issue)
* 13:33 elukey: roll restart of all thanos-swift proxies to pick up new ML account - [[phab:T280773|T280773]]
* 12:50 jbond42: upload new debmonitor-client packages
* 11:50 moritzm: installing perf updates from Buster 10.9 point release
* 10:06 moritzm: installing Linux 4.19.181 updates from Buster 10.9 point release (no reboots, just updating the packages)
* 09:54 moritzm: installing xen security updates
* 09:49 moritzm: installing xorg-server security updates
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15512 and previous config saved to /var/cache/conftool/dbconfig/20210423-093723-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15511 and previous config saved to /var/cache/conftool/dbconfig/20210423-092220-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15510 and previous config saved to /var/cache/conftool/dbconfig/20210423-090716-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15509 and previous config saved to /var/cache/conftool/dbconfig/20210423-085212-root.json
* 08:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
* 08:21 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
* 08:13 moritzm: upgrading d-i image for bullseye to RC1 release [[phab:T275873|T275873]]
* 08:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
* 08:12 moritzm: upgrading d-i image for bullseye to RC1 release
* 08:12 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1019.eqiad.wmnet
* 07:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1019.eqiad.wmnet
* 07:56 jynus: deleting db1156 s2 database and reloading it from logical backups [[phab:T280492|T280492]]
* 07:22 Amir1: removing junk bounced email addresses from yahoo from all mailing lists
* 05:40 marostegui: Stop db1079 to clone db1158 (lag will appear on s7 on wiki replicas)
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to clone db1158 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15506 and previous config saved to /var/cache/conftool/dbconfig/20210423-053907-marostegui.json
== 2021-04-22 ==
* 17:26 marostegui: Stop mysql on tendril/dbtree database
* 16:33 volker-e@deploy1002: Finished deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455) (duration: 00m 06s)
* 16:32 volker-e@deploy1002: Started deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455)
* 13:23 marostegui: Tendril and dbtree are up but on a degraded status (slow reponse)
* 13:19 marostegui: Tendril and dbtree are down at the moment
* 12:46 Urbanecm: Start server-side upload for 2 video files ([[phab:T280763|T280763]], [[phab:T280524|T280524]])
* 12:31 marostegui: Restart mysql on db1115 (tendril/dbtree will fail)
* 04:55 eileen: civicrm revision changed from {{Gerrit|42ca3cf65a}} to {{Gerrit|33a63d5789}}, config revision is {{Gerrit|cf07e7ba0b}}
* 02:47 krinkle@deploy1002: Finished deploy [integration/docroot@010e445]: (no justification provided) (duration: 00m 09s)
* 02:47 krinkle@deploy1002: Started deploy [integration/docroot@010e445]: (no justification provided)
* 01:34 eileen: civicrm revision changed from {{Gerrit|35a8dd33ba}} to {{Gerrit|42ca3cf65a}}, config revision is {{Gerrit|cf07e7ba0b}}
* 00:28 legoktm: legoktm@deneb:/var/cache/pbuilder/aptcache$ sudo rm -rf * # Cleaned up 8GB more
* 00:27 legoktm: legoktm@deneb:/var/cache/apt/archives$ sudo rm -rf * # cleaned up 6GB
* 00:03 legoktm: subscribed all list admins to the listadmins@ mailing list ([[phab:T280716|T280716]])
== 2021-04-21 ==
* 23:58 eileen: tools revision changed from {{Gerrit|3d950fffbd}} to {{Gerrit|c26a8c0cb6}}
* 23:49 legoktm: made myself and Amir1 list admins for the listadmins@lists.wikimedia.org mailing list
* 20:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1017.eqiad.wmnet
* 20:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1017.eqiad.wmnet
* 20:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1016.eqiad.wmnet
* 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.eqiad.wmnet
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host planet1003.eqiad.wmnet
* 19:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:48 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:46 mutante: creating a ganeti VM to test bullseye install
* 19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
* 19:45 bstorm: manually kicking off a run of update-openstack-mirror on sodium to capture an upstream package update
* 19:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:46 Urbanecm: Morning B&C done
* 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikibaseMediaInfo/: {{Gerrit|f831d16e42e712832d683233a5b21ad59f7c73b3}}: Make the logistic regression image search default ([[phab:T271799|T271799]]) (duration: 00m 58s)
* 18:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f6d076a69607172475a86ba935a273e7519108d1}}: Update $wgGEHomepageNewAccountVariants ([[phab:T278123|T278123]]) (duration: 00m 58s)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1ae5ca5467fad7bfdae8aa94b241fe6c048ab8e5}}: Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere ([[phab:T279853|T279853]]) (duration: 00m 59s)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e252de0482c60e87e06d866006bb9ceb186af6cf}}: eswiki: Push Growth features out of dark mode ([[phab:T278235|T278235]]) (duration: 01m 00s)
* 17:43 jynus: deploy grant changes on m5 backup sources (db1117 and db2078) [[phab:T278614|T278614]]
* 15:54 legoktm: [[phab:T280744|T280744]]: legoktm@lists1001:~$ sudo chmod 644 /etc/aliases
* 15:15 Urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php # [[phab:T279853|T279853]]
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15503 and previous config saved to /var/cache/conftool/dbconfig/20210421-151526-root.json
* 15:02 moritzm: installing jquery security updates on buster
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15502 and previous config saved to /var/cache/conftool/dbconfig/20210421-150023-root.json
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15501 and previous config saved to /var/cache/conftool/dbconfig/20210421-144519-root.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15500 and previous config saved to /var/cache/conftool/dbconfig/20210421-143015-root.json
* 14:25 jbond42: upload new version of debmonitor-client to apt
* 13:54 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=fawiki # [[phab:T279853|T279853]]
* 13:39 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 13:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiki # [[phab:T279853|T279853]]
* 13:01 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 12:21 moritzm: installing failoid2002
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 11:49 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:46 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 11:32 awight: EU backport window complete
* 11:31 moritzm: installing failoid1002
* 11:29 awight@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents: Backport: [[gerrit:681334{{!}}Send 0 edits userEditCountBucket for anons (T210106)]] (duration: 00m 59s)
* 10:41 jbond42: switch debmonitor-client to cfssl (second try)
* 10:37 jbond42: upload golang-cfssl packages for jessi and stretch
* 10:33 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid1002.eqiad.wmnet
* 10:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1002.eqiad.wmnet
* 10:23 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid1002.eqiad.wmnet
* 10:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1002.eqiad.wmnet
* 10:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid2002.codfw.wmnet
* 10:21 hnowlan: rebooting eventlog1002 for kernel update
* 10:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid2002.codfw.wmnet
* 09:56 jbond42: switch debmonitor-clients to use cfssl
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15496 and previous config saved to /var/cache/conftool/dbconfig/20210421-093109-root.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15495 and previous config saved to /var/cache/conftool/dbconfig/20210421-091605-root.json
* 09:08 elukey: upgrade hue on an-tool1009 to 4.9
* 09:05 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 09:05 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 09:03 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet,service=nginx
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15494 and previous config saved to /var/cache/conftool/dbconfig/20210421-090100-root.json
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet