You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: T281405 (duration: 01m 08s))
imported>Stashbot
(legoktm: regenerating pipermail redirects to skip those with duplicate message-ids (T280731))
(46 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 2021-04-29 ==
== 2021-06-17 ==
* 00:40 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: [[phab:T281405|T281405]] (duration: 01m 08s)
* 21:49 legoktm: regenerating pipermail redirects to skip those with duplicate message-ids ([[phab:T280731|T280731]])
* 00:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 18:24 ryankemper: [[phab:T285106|T285106]] [WDQS] `ryankemper@wdqs2001:~$ sudo depool`
* 00:06 ryankemper: [[phab:T280382|T280382]] `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 18:01 dancy: Deployed latest scap code to beta cluster
* 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase/client/includes/ClientHooks.php: Backport: [[gerrit:700036{{!}}client: Bring back using the client setting for langlink group (T284854)]] (duration: 00m 58s)
* 13:28 jbond: add prometheus-jmx-exporter to bullseye-wikimedia
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16604 and previous config saved to /var/cache/conftool/dbconfig/20210617-121146-root.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16603 and previous config saved to /var/cache/conftool/dbconfig/20210617-120109-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16602 and previous config saved to /var/cache/conftool/dbconfig/20210617-115643-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16601 and previous config saved to /var/cache/conftool/dbconfig/20210617-115319-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16600 and previous config saved to /var/cache/conftool/dbconfig/20210617-114605-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16599 and previous config saved to /var/cache/conftool/dbconfig/20210617-114139-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16598 and previous config saved to /var/cache/conftool/dbconfig/20210617-113816-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16597 and previous config saved to /var/cache/conftool/dbconfig/20210617-113101-root.json
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16596 and previous config saved to /var/cache/conftool/dbconfig/20210617-112635-root.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16595 and previous config saved to /var/cache/conftool/dbconfig/20210617-112431-marostegui.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16594 and previous config saved to /var/cache/conftool/dbconfig/20210617-112312-root.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16593 and previous config saved to /var/cache/conftool/dbconfig/20210617-111558-root.json
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16592 and previous config saved to /var/cache/conftool/dbconfig/20210617-111026-marostegui.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16591 and previous config saved to /var/cache/conftool/dbconfig/20210617-110808-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16590 and previous config saved to /var/cache/conftool/dbconfig/20210617-110656-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16589 and previous config saved to /var/cache/conftool/dbconfig/20210617-110200-marostegui.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16588 and previous config saved to /var/cache/conftool/dbconfig/20210617-105153-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16587 and previous config saved to /var/cache/conftool/dbconfig/20210617-103649-root.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16586 and previous config saved to /var/cache/conftool/dbconfig/20210617-102145-root.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P16585 and previous config saved to /var/cache/conftool/dbconfig/20210617-101827-marostegui.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16584 and previous config saved to /var/cache/conftool/dbconfig/20210617-100445-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16583 and previous config saved to /var/cache/conftool/dbconfig/20210617-094942-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16582 and previous config saved to /var/cache/conftool/dbconfig/20210617-093438-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16581 and previous config saved to /var/cache/conftool/dbconfig/20210617-092056-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16580 and previous config saved to /var/cache/conftool/dbconfig/20210617-091934-root.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P16579 and previous config saved to /var/cache/conftool/dbconfig/20210617-090947-marostegui.json
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16578 and previous config saved to /var/cache/conftool/dbconfig/20210617-090552-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16577 and previous config saved to /var/cache/conftool/dbconfig/20210617-085048-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16576 and previous config saved to /var/cache/conftool/dbconfig/20210617-084941-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16575 and previous config saved to /var/cache/conftool/dbconfig/20210617-083545-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16574 and previous config saved to /var/cache/conftool/dbconfig/20210617-083438-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16573 and previous config saved to /var/cache/conftool/dbconfig/20210617-083005-marostegui.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16572 and previous config saved to /var/cache/conftool/dbconfig/20210617-082939-marostegui.json
* 08:28 elukey: upload istioctl 1.6.14-1 to buster-wikimedia
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16571 and previous config saved to /var/cache/conftool/dbconfig/20210617-082437-root.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16570 and previous config saved to /var/cache/conftool/dbconfig/20210617-082409-marostegui.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16569 and previous config saved to /var/cache/conftool/dbconfig/20210617-081934-root.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16568 and previous config saved to /var/cache/conftool/dbconfig/20210617-080933-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16567 and previous config saved to /var/cache/conftool/dbconfig/20210617-080430-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16566 and previous config saved to /var/cache/conftool/dbconfig/20210617-075825-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16565 and previous config saved to /var/cache/conftool/dbconfig/20210617-075429-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16564 and previous config saved to /var/cache/conftool/dbconfig/20210617-073926-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16563 and previous config saved to /var/cache/conftool/dbconfig/20210617-073305-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16562 and previous config saved to /var/cache/conftool/dbconfig/20210617-073229-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16561 and previous config saved to /var/cache/conftool/dbconfig/20210617-071726-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16560 and previous config saved to /var/cache/conftool/dbconfig/20210617-070222-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16559 and previous config saved to /var/cache/conftool/dbconfig/20210617-064717-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16558 and previous config saved to /var/cache/conftool/dbconfig/20210617-063135-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16557 and previous config saved to /var/cache/conftool/dbconfig/20210617-062514-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16556 and previous config saved to /var/cache/conftool/dbconfig/20210617-061010-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16555 and previous config saved to /var/cache/conftool/dbconfig/20210617-055507-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16554 and previous config saved to /var/cache/conftool/dbconfig/20210617-054003-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16553 and previous config saved to /var/cache/conftool/dbconfig/20210617-053455-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16552 and previous config saved to /var/cache/conftool/dbconfig/20210617-053105-root.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16551 and previous config saved to /var/cache/conftool/dbconfig/20210617-051601-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16550 and previous config saved to /var/cache/conftool/dbconfig/20210617-050057-root.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16549 and previous config saved to /var/cache/conftool/dbconfig/20210617-044554-root.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16548 and previous config saved to /var/cache/conftool/dbconfig/20210617-044146-marostegui.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16547 and previous config saved to /var/cache/conftool/dbconfig/20210617-044132-marostegui.json
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16546 and previous config saved to /var/cache/conftool/dbconfig/20210617-043130-marostegui.json


== 2021-04-28 ==
== 2021-06-16 ==
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 21:32 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 17:41 dancy: Reverted Scap release on beta
* 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 16:18 topranks: Resetting metric on Telia CCT IC-331929, cr1-codfw and cr3-eqsin.
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 15:22 dancy: testing upcoming Scap release on beta
* 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16545 and previous config saved to /var/cache/conftool/dbconfig/20210616-125329-root.json
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16544 and previous config saved to /var/cache/conftool/dbconfig/20210616-123826-root.json
* 23:06 dpifke@deploy1002: Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)
* 12:34 kormat: deploying heartbeat service puppet change
* 23:06 dpifke@deploy1002: Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16543 and previous config saved to /var/cache/conftool/dbconfig/20210616-122322-root.json
* 22:44 dwisehaupt: civiproxy revision changed to {{Gerrit|99cecb924a}} - initial rollout of code for testing
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16541 and previous config saved to /var/cache/conftool/dbconfig/20210616-120818-root.json
* 22:26 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 22:26 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 22:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16540 and previous config saved to /var/cache/conftool/dbconfig/20210616-120015-marostegui.json
* 22:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16539 and previous config saved to /var/cache/conftool/dbconfig/20210616-112115-root.json
* 22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 11:20 hnowlan: running `nodetool cleanup` on maps1005
* 22:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16538 and previous config saved to /var/cache/conftool/dbconfig/20210616-110612-root.json
* 22:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16537 and previous config saved to /var/cache/conftool/dbconfig/20210616-105108-root.json
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 10:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16536 and previous config saved to /var/cache/conftool/dbconfig/20210616-103604-root.json
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
* 21:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16535 and previous config saved to /var/cache/conftool/dbconfig/20210616-102349-marostegui.json
* 21:44 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 09:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 21:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 09:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 21:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 09:50 hnowlan: disabling puppet on maps1* to reparent maps1007 from new master maps1009
* 21:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 09:47 kormat: truncating all pc* tables on pc1010 [[phab:T282761|T282761]]
* 21:39 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 09:40 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1009 as pc3 primary [[phab:T282761|T282761]] (duration: 00m 59s)
* 21:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 09:04 kormat: Deploying wmfmariadbpy 0.7.1 [[phab:T284819|T284819]]
* 21:37 ryankemper: [[phab:T280382|T280382]] `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer
* 09:04 kormat: uploaded wmfmariadbpy 0.7.1 to apt.wm.o
* 21:32 ryankemper: [[phab:T280382|T280382]] [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`
* 08:24 Amir1: running "update flaggedrevs set fr_quality = 0 where fr_quality != 0;" on all wikis where flagged revs is enabled ([[phab:T279761|T279761]])
* 21:24 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)
* 07:27 dcausse: cleanup old /var/log/airflow/scheduler logs to reclaim space on an-airflow1001
* 21:07 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 06:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 06:52 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 05:06 marostegui: Upgrade clouddb1014
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 20:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:57 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:13 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 07s)
* 19:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 18:21 legoktm: added mvolz as listadmin for services@ and reset admin pw ([[phab:T278516|T278516]])
* 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: {{Gerrit|b392dba0d77904d7de819043e51d8c3fbf003873}}: Fix incorrect ItemId typehint in Lua bindings ([[phab:T281361|T281361]]) (duration: 01m 09s)
* 16:52 papaul: powerdown logstash2034 for relocation
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 16:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 16:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 16:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 16:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 16:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:20 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts conf[2001-2003].codfw.wmnet
* 15:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:03 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:00 moritzm: imported python-poolcounter 0.0.2-1+deb11u1 to apt.wikimedia.org [[phab:T275873|T275873]]
* 14:53 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[2001-2003].codfw.wmnet
* 14:44 moritzm: imported gitlab-ce 13.9.7-ce.0 to apt.wikimedia.org
* 14:40 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d] (duration: 04m 59s)
* 14:35 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d] (duration: 00m 06s)
* 14:34 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 03m 07s)
* 14:32 moritzm: installing iproute2 updates from buster point release
* 14:31 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 14:30 milimetric@deploy1002: deploy aborted: - (duration: 00m 00s)
* 14:30 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: -
* 14:30 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 12m 31s)
* 14:26 moritzm: installing net-snmp updates from buster point release
* 14:17 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:15 jayme: restarting pybal on lvs5001,lvs4005,lvs2007 - [[phab:T271573|T271573]]
* 13:14 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 3.17.0-wmf.1"
* 13:10 jayme: restarting pybal on lvs5002,lvs4006,lvs2008 - [[phab:T271573|T271573]]
* 13:04 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 13:03 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 13:02 moritzm: upgrading deployment servers to PHP 7.4.32
* 12:55 moritzm: upgrading snapshot hosts to PHP 7.4.32
* 12:48 jayme: restarting pybal on lvs2009 - [[phab:T271573|T271573]]
* 12:45 moritzm: upgrading labweb to PHP 7.4.32
* 12:43 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:42 jayme: restarting pybal on lvs5003,lvs4007 - [[phab:T271573|T271573]]
* 12:39 jayme: restarting pybal on lvs2010 - [[phab:T271573|T271573]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:28 apergos: manually edited /srv/deployment/dumps/dumps-cache/config on snapshots1011,12,13 to change deploy1001 to deploy1002 (where did it get the old value from? these are new installs!)
* 12:16 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:15 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 11:53 jayme: switching SRV record _etcd._tcp to new etcd cluster (for codfw, eqsin, ulsfo)
* 11:22 Urbanecm: EU B&C window done
* 11:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: {{Gerrit|8d0ae5e8fedefa911fc216bfc810d7a6169ea7e5}}: Separate reference preview settings in beta & non-beta ([[phab:T281235|T281235]]) (duration: 01m 08s)
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddbc378e41783356e28cd90bbefa08624ea2844c}}: Enable partial action blocks on testwiki ([[phab:T280528|T280528]]) (duration: 01m 07s)
* 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 10:44 jbond42: updated the check-raid nrpe script to python3
* 09:40 moritzm: restarting Tomcat on idp-test1001 to pick up Java security updates
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15618 and previous config saved to /var/cache/conftool/dbconfig/20210428-092103-root.json
* 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1001.wikimedia.org
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint1001.wikimedia.org
* 09:09 moritzm: restarting jenkins* on releases to pick up Java security updates
* 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15617 and previous config saved to /var/cache/conftool/dbconfig/20210428-090559-root.json
* 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15616 and previous config saved to /var/cache/conftool/dbconfig/20210428-085056-root.json
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 01m 08s)
* 08:41 urbanecm@deploy1002: sync-file aborted: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 00m 02s)
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15615 and previous config saved to /var/cache/conftool/dbconfig/20210428-083625-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15614 and previous config saved to /var/cache/conftool/dbconfig/20210428-083552-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15613 and previous config saved to /var/cache/conftool/dbconfig/20210428-083458-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15612 and previous config saved to /var/cache/conftool/dbconfig/20210428-082625-root.json
* 08:25 effie: update php7.2 on jobrunners and parsoid servers && rolling  php7.2-fpm restarts
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15611 and previous config saved to /var/cache/conftool/dbconfig/20210428-081121-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15610 and previous config saved to /var/cache/conftool/dbconfig/20210428-075618-root.json
* 07:52 effie: update php7.2 on api servers && rolling  php7.2-fpm restarts
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15609 and previous config saved to /var/cache/conftool/dbconfig/20210428-074114-root.json
* 07:40 marostegui: Deploy schema change on db1098:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 07:27 effie: update php7.2 on appservers && rolling  php7.2-fpm restarts
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for schema change and kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15608 and previous config saved to /var/cache/conftool/dbconfig/20210428-072609-marostegui.json
* 07:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:14 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:12 elukey: add AAAA record for kafka-main200[3,4,5].codfw.wmnet
* 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:05 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:04 elukey: add AAAA record for kafka-main2002.codfw.wmnet
* 07:03 marostegui: Deploy schema change on db2089:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:26 legoktm: created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm
* 06:23 legoktm: legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json
* 06:00 marostegui: Stop MySQL on db2096 (x1 codfw) [[phab:T281135|T281135]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1167 in s8 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json
* 05:00 marostegui: Starting s1 eqiad failover from db1083 to db1163 - [[phab:T278214|T278214]]
* 04:14 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 04:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:08 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 04:08 marostegui: Start replication changes, connect everything to db1163 [[phab:T278214|T278214]]
* 04:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json
* 03:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:49 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
* 03:48 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet
* 03:33 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 02:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:28 robh@cumin1001: START - Cookbook sre.dns.netbox
* 01:06 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE


== 2021-04-27 ==
== 2021-06-15 ==
* 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 17:54 dancy: testing upcoming Scap release on beta
* 23:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 17:21 mutante: new Wikimedia language "shi" added - Shilha /ˈʃɪlhə/ is a Berber language native to Shilha people. The endonym is Taclḥit /taʃlʜijt/, and in recent English publications the language is often rendered Tashelhiyt or Tashelhit.
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
* 17:17 mutante: new Wikimedia language "dag" added - Dagbani (or Dagbane), also known as Dagbanli and Dagbanle, is a Gur language spoken in Ghana.
* 23:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 17:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
* 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 17:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
* 23:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 16:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
* 23:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 16:11 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
* 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 14:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet
* 14:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:55 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet
* 14:25 XioNoX: re-enable cr1-codfw:xe-5/1/2
* 20:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet
* 13:23 marostegui: Upgrade clouddb1018
* 20:42 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet
* 13:15 effie: enable puppet on canaries
* 20:32 bblack: re-pooling codfw public traffic - [[phab:T279457|T279457]]
* 13:10 effie: disable puppet on canaries to deploy 699908
* 20:11 jhuneidi@deploy1002: Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment ([[phab:T281226|T281226]]) to 1.37.0-wmf.3 (duration: 01m 12s)
* 10:45 XioNoX: re-enable cr1-codfw:xe-5/1/2
* 20:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 09:42 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
* 20:06 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P16533 and previous config saved to /var/cache/conftool/dbconfig/20210615-092511-marostegui.json
* 19:44 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318, db2082', diff saved to https://phabricator.wikimedia.org/P16532 and previous config saved to /var/cache/conftool/dbconfig/20210615-092409-marostegui.json
* 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P16531 and previous config saved to /var/cache/conftool/dbconfig/20210615-090802-marostegui.json
* 19:35 papaul: powerdown ms-backup2001  for maintenance
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2083', diff saved to https://phabricator.wikimedia.org/P16530 and previous config saved to /var/cache/conftool/dbconfig/20210615-090650-marostegui.json
* 19:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2084', diff saved to https://phabricator.wikimedia.org/P16529 and previous config saved to /var/cache/conftool/dbconfig/20210615-090243-marostegui.json
* 19:07 papaul: powerdown logstash2035  for maintenance
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2081', diff saved to https://phabricator.wikimedia.org/P16528 and previous config saved to /var/cache/conftool/dbconfig/20210615-090206-marostegui.json
* 19:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P16527 and previous config saved to /var/cache/conftool/dbconfig/20210615-085953-marostegui.json
* 19:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P16526 and previous config saved to /var/cache/conftool/dbconfig/20210615-085938-marostegui.json
* 18:50 mutante: people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 db2083 db2084 db2091', diff saved to https://phabricator.wikimedia.org/P16525 and previous config saved to /var/cache/conftool/dbconfig/20210615-083233-marostegui.json
* 18:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P16524 and previous config saved to /var/cache/conftool/dbconfig/20210615-082857-marostegui.json
* 18:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 06:10 XioNoX: roll OSPF link-protection to all routers - [[phab:T167306|T167306]]
* 18:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 02:30 eileen: civicrm revision changed from {{Gerrit|d9d61dad0b}} to {{Gerrit|acbcce94a2}}, config revision is {{Gerrit|2aed6ff89b}}
* 18:33 mutante: people1003 - rebooting, trying to get new VM to work
* 01:22 eileen: civicrm revision changed from {{Gerrit|28ace1b86f}} to {{Gerrit|d9d61dad0b}}, config revision is {{Gerrit|2aed6ff89b}}
* 18:33 Urbanecm: Morning B&C window done
* 00:37 eileen: civicrm revision changed from {{Gerrit|31d07115a0}} to {{Gerrit|28ace1b86f}}, config revision is {{Gerrit|2aed6ff89b}}
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|91a85f2}}: {{Gerrit|ac770bf}}: Enable language in header for office and testwiki users ([[phab:T280526|T280526]]) (duration: 01m 19s)
* 18:32 bblack: lvs2009 - restart pybal + re-run puppet agent - [[phab:T279457|T279457]]
* 18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:20 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[56].codfw.wmnet
* 18:20 bblack: cp203[56] - repooling in etcd - [[phab:T279457|T279457]]
* 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:17 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:17 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:16 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:12 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:11 bblack: dns2001 - restarting bird to repool, then re-enabling puppet - [[phab:T279457|T279457]]
* 18:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:02 ejegg: update payments-wiki from {{Gerrit|9a4eef1375}} to {{Gerrit|44570561f2}}
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 17:58 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 17:34 papaul: powerdown moss-fe2001  for maintenance
* 17:32 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:25 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:23 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:21 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:19 ryankemper: [[phab:T281215|T281215]] Banned `elastic2043` from codfw cirrussearch cluster
* 17:16 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:14 papaul: powerdown kafka-logging2003  for maintenance
* 17:14 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:09 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:52 papaul: powerdown elastic2045  for maintenance
* 16:49 papaul: powerdown ms-be2042 for maintenance
* 16:39 dcaro: reprepro updating packages on thirdparty/ceph-nautilus-buster
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 39 hosts with reason: upgrading openstack
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 39 hosts with reason: upgrading openstack
* 16:22 effie: upgrading scap 3.17.1-1 on mediawiki canaries - [[phab:T279695|T279695]]
* 16:18 effie: uploading scap_3.17.1-1
* 16:18 effie: uploading cap_3.17.1-1
* 15:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1026.eqiad.wmnet
* 14:48 moritzm: installing file/libmagic updates from buster point release
* 14:47 bblack: lvs2009 - disable puppet + stop pybal (internal services will move to lvs2010, please avoid LVS service definition changes for now!) - [[phab:T279457|T279457]]
* 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
* 14:36 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[56].codfw.wmnet
* 14:36 bblack: cp203[56] - depool all etcd services via confctl - [[phab:T279457|T279457]]
* 14:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
* 14:33 bblack: dns2001 - depooling for [[phab:T279457|T279457]] (disable puppet + stop bird)
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
* 14:31 moritzm: installing imagemagick security updates
* 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
* 14:20 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:19 moritzm: installing xen security updates
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 14:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 14:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 14:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 13:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 13:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 13:55 moritzm: imported jenkins 2.277.3 to thirdparty/ci
* 13:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 13:48 moritzm: uploaded openjdk-8 8u292-b10-0~deb10u1 (buster forward port of latest Java 8 security release)
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris: switchover api-gateway, changeprop, cpjobqueue to use the new redis cluster servers (rdb2007-rdb2010)
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:30 hashar: Upgrading CI Jenkins from 2.263.3 to 2.277.2
* 13:23 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1020-1026].eqiad.wmnet
* 13:19 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:13 liw@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.3
* 13:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/includes/Config/WikiPageConfigValidation.php: {{Gerrit|fe2a0420fd884df7046c0c283bcb2e961e74e8e9}}: WikiPageConfigValidation: Mentor lists and help desk can be null ([[phab:T281229|T281229]]) (duration: 01m 06s)
* 13:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:06 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1020-1026].eqiad.wmnet
* 13:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be1019.eqiad.wmnet
* 12:55 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1019.eqiad.wmnet
* 12:46 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:682815{{!}}Revert "URGENT: Disable GlobalUsage" (T281242)]] (duration: 01m 08s)
* 12:44 hashar: Restarted CI Jenkins for plugins upgrade
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15592 and previous config saved to /var/cache/conftool/dbconfig/20210427-122619-root.json
* 12:20 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GlobalUsage: Backport: [[gerrit:682814{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:12 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GlobalUsage: Backport: [[gerrit:682813{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15591 and previous config saved to /var/cache/conftool/dbconfig/20210427-121115-root.json
* 12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 12:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15590 and previous config saved to /var/cache/conftool/dbconfig/20210427-115612-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15589 and previous config saved to /var/cache/conftool/dbconfig/20210427-114108-root.json
* 11:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 11:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove RW from commonswiki', diff saved to https://phabricator.wikimedia.org/P15588 and previous config saved to /var/cache/conftool/dbconfig/20210427-111016-marostegui.json
* 11:09 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Disable GlobalUsage (duration: 01m 08s)
* 10:40 volans@cumin1001: dbctl commit (dc=all): 'S4 RO, outage', diff saved to https://phabricator.wikimedia.org/P15585 and previous config saved to /var/cache/conftool/dbconfig/20210427-104057-volans.json
* 10:18 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 10:06 XioNoX: standardize management routers ACLs with Capirca - mr1-eqiad (last one)
* 10:01 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 02m 16s)
* 09:59 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:56 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 22s)
* 09:56 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15584 and previous config saved to /var/cache/conftool/dbconfig/20210427-093536-marostegui.json
* 09:35 XioNoX: standardize management routers ACLs with Capirca - mr1-eqsin
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15583 and previous config saved to /var/cache/conftool/dbconfig/20210427-093501-root.json
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 09:33 moritzm: rolling restart of elastic in relforge* to pick up Java updates
* 09:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15582 and previous config saved to /var/cache/conftool/dbconfig/20210427-091957-root.json
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 09:17 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 09:16 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 09:11 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:11 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:06 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15581 and previous config saved to /var/cache/conftool/dbconfig/20210427-090454-root.json
* 09:04 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:01 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15580 and previous config saved to /var/cache/conftool/dbconfig/20210427-084950-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15579 and previous config saved to /var/cache/conftool/dbconfig/20210427-084651-marostegui.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15578 and previous config saved to /var/cache/conftool/dbconfig/20210427-084630-root.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114 into main and api', diff saved to https://phabricator.wikimedia.org/P15577 and previous config saved to /var/cache/conftool/dbconfig/20210427-083910-marostegui.json
* 08:36 XioNoX: standardize management routers ACLs with Capirca
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15576 and previous config saved to /var/cache/conftool/dbconfig/20210427-083145-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15575 and previous config saved to /var/cache/conftool/dbconfig/20210427-083126-root.json
* 08:24 hashar: Restarting CI Jenkins for plugins upgrade
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15574 and previous config saved to /var/cache/conftool/dbconfig/20210427-081911-marostegui.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15573 and previous config saved to /var/cache/conftool/dbconfig/20210427-081846-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15572 and previous config saved to /var/cache/conftool/dbconfig/20210427-081623-root.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15571 and previous config saved to /var/cache/conftool/dbconfig/20210427-081325-root.json
* 08:12 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:11 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:10 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:08 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:03 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 90%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15570 and previous config saved to /var/cache/conftool/dbconfig/20210427-080342-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15569 and previous config saved to /var/cache/conftool/dbconfig/20210427-080119-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15568 and previous config saved to /var/cache/conftool/dbconfig/20210427-075822-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15567 and previous config saved to /var/cache/conftool/dbconfig/20210427-075759-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15566 and previous config saved to /var/cache/conftool/dbconfig/20210427-075738-root.json
* 07:52 liw@deploy1002: Pruned MediaWiki: 1.36.0-wmf.38 (duration: 03m 17s)
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 80%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15565 and previous config saved to /var/cache/conftool/dbconfig/20210427-074839-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15564 and previous config saved to /var/cache/conftool/dbconfig/20210427-074318-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15563 and previous config saved to /var/cache/conftool/dbconfig/20210427-074234-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15562 and previous config saved to /var/cache/conftool/dbconfig/20210427-073335-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15561 and previous config saved to /var/cache/conftool/dbconfig/20210427-072814-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15560 and previous config saved to /var/cache/conftool/dbconfig/20210427-072731-root.json
* 07:26 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 07:24 liw@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.3 (duration: 30m 54s)
* 07:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 60%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15559 and previous config saved to /var/cache/conftool/dbconfig/20210427-071831-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15558 and previous config saved to /var/cache/conftool/dbconfig/20210427-071227-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15557 and previous config saved to /var/cache/conftool/dbconfig/20210427-070328-root.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for schema change', diff saved to https://phabricator.wikimedia.org/P15556 and previous config saved to /var/cache/conftool/dbconfig/20210427-065628-marostegui.json
* 06:55 elukey: upgrade mariadb to 10.4.18-1 + reboot on db1108 - [[phab:T279281|T279281]]
* 06:54 liw@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.3
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 40%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15555 and previous config saved to /var/cache/conftool/dbconfig/20210427-064824-root.json
* 06:37 liw: version 1.37.0-wmf.3 was branched at {{Gerrit|20ab303fd1d883592b4d2ec2468dfaccad7a9e10}} for [[phab:T278347|T278347]]
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 30%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15554 and previous config saved to /var/cache/conftool/dbconfig/20210427-063320-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15553 and previous config saved to /var/cache/conftool/dbconfig/20210427-061817-root.json
* 06:11 elukey: powercycle elastic2043 - no ssh, no tty remote console available
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 20%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15552 and previous config saved to /var/cache/conftool/dbconfig/20210427-060313-root.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 15%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15551 and previous config saved to /var/cache/conftool/dbconfig/20210427-054809-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 10%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15550 and previous config saved to /var/cache/conftool/dbconfig/20210427-053306-root.json
* 05:30 XioNoX: push pfw fw policies - [[phab:T281137|T281137]]
* 05:27 legoktm: imported hyperkitty_1.3.4-2~bpo10+2 to apt.wm.o ([[phab:T281213|T281213]])
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15549 and previous config saved to /var/cache/conftool/dbconfig/20210427-052236-root.json
* 05:21 marostegui: Stop mysql on db1087 to clone db1167 (lag will appear on wikidata on wikireplicas) [[phab:T258361|T258361]]
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 temporarily as db1087 will be depooled', diff saved to https://phabricator.wikimedia.org/P15547 and previous config saved to /var/cache/conftool/dbconfig/20210427-052026-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 5%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15546 and previous config saved to /var/cache/conftool/dbconfig/20210427-051802-root.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15545 and previous config saved to /var/cache/conftool/dbconfig/20210427-050826-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15544 and previous config saved to /var/cache/conftool/dbconfig/20210427-050732-root.json
* 05:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1077.eqiad.wmnet
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1077.eqiad.wmnet
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15543 and previous config saved to /var/cache/conftool/dbconfig/20210427-045229-root.json
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15541 and previous config saved to /var/cache/conftool/dbconfig/20210427-044609-marostegui.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15540 and previous config saved to /var/cache/conftool/dbconfig/20210427-044520-marostegui.json
* 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15539 and previous config saved to /var/cache/conftool/dbconfig/20210427-043725-root.json
* 04:25 legoktm: upgrading lists-next.wikimedia.org to mailman3-from-bullseye ([[phab:T280887|T280887]])
* 04:19 marostegui: Set phabricator on read only [[phab:T279625|T279625]]
* 03:37 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 03:36 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s)
* 03:28 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet
* 03:28 ryankemper@deploy1002: Started deploy [wdqs/wdqs@08ad17a]: 0.3.70
* 03:27 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003`
* 03:17 ryankemper: [[phab:T280382|T280382]] `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 02:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:29 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id [[phab:T280382|T280382]]` on `ryankemper@cumin1001` tmux session `reimage`
* 01:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:21 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2021-04-26 ==
== 2021-06-14 ==
* 23:28 mutante: renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts
* 21:40 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@baeee47]: [[phab:T261407|T261407]] bulk_daemon: Deploy prioritized topics (duration: 00m 49s)
* 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 21:40 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@baeee47]: [[phab:T261407|T261407]] bulk_daemon: Deploy prioritized topics
* 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 19:27 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1003.eqiad.wmnet
* 22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 19:21 twentyafterfour_: applying hotfix for [[phab:T284397|T284397]] and restarting php7.3-fpm on phab1001
* 22:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 18:30 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1003.eqiad.wmnet
* 22:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 17:05 jforrester@deploy1002: Finished deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]] (duration: 00m 07s)
* 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 17:05 jforrester@deploy1002: Started deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]]
* 20:48 twentyafterfour: restarting php-fpm on phab1001 to deploy phabricator hotfix {{Gerrit|d238db85b8d8072d99f31805aa4a8a7cf0c09941}}
* 16:46 jforrester@deploy1002: Finished deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]] (duration: 00m 07s)
* 20:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 16:46 jforrester@deploy1002: Started deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]]
* 20:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet
* 15:56 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1002.eqiad.wmnet
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16521 and previous config saved to /var/cache/conftool/dbconfig/20210614-155258-root.json
* 19:45 legoktm: uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16520 and previous config saved to /var/cache/conftool/dbconfig/20210614-153754-root.json
* 18:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 15:24 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 18:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16519 and previous config saved to /var/cache/conftool/dbconfig/20210614-152250-root.json
* 18:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1005.eqiad.wmnet
* 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16518 and previous config saved to /var/cache/conftool/dbconfig/20210614-150747-root.json
* 18:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1005.eqiad.wmnet
* 18:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 15:04 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1002.eqiad.wmnet
* 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2d16f6251a67cf13cef02bbdcb3c9f5c1c505d16}}: elwiki: Update Growth experiments configuration ([[phab:T280172|T280172]]) (duration: 00m 58s)
* 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1004.eqiad.wmnet
* 18:06 urbanecm@deploy1002: Synchronized multiversion/MWScript.php: {{Gerrit|5ace4e1b806bcfc4ea059f9e9cae9aa94c0bdbd1}}: Fix error message if MWScript.php is run without arguments (duration: 00m 58s)
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1004.eqiad.wmnet
* 17:28 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16517 and previous config saved to /var/cache/conftool/dbconfig/20210614-145243-root.json
* 17:26 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1003.eqiad.wmnet
* 17:18 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16516 and previous config saved to /var/cache/conftool/dbconfig/20210614-145039-root.json
* 17:06 legoktm: imported postorius_1.3.4-2~bpo10+2 to apt.wm.o
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1003.eqiad.wmnet
* 16:49 mutante: gerrit - restarted apache (hard) to remove time out from gerrit:682502
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16515 and previous config saved to /var/cache/conftool/dbconfig/20210614-144130-marostegui.json
* 16:40 mutante: gerrit1001 - reload apache2
* 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1002.eqiad.wmnet
* 16:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1025.eqiad.wmnet
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16514 and previous config saved to /var/cache/conftool/dbconfig/20210614-143536-root.json
* 16:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1025.eqiad.wmnet
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1002.eqiad.wmnet
* 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16513 and previous config saved to /var/cache/conftool/dbconfig/20210614-143224-root.json
* 15:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16512 and previous config saved to /var/cache/conftool/dbconfig/20210614-143211-root.json
* 15:21 elukey: restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1001.eqiad.wmnet
* 15:06 moritzm: installing jquery security updates on stretch
* 14:27 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice<nowiki>{</nowiki>BannerHistory,Impression<nowiki>}</nowiki> to EventGate on all wikis - [[phab:T271168|T271168]] (duration: 00m 57s)
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1001.eqiad.wmnet
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2007.codfw.wmnet
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16511 and previous config saved to /var/cache/conftool/dbconfig/20210614-142032-root.json
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16510 and previous config saved to /var/cache/conftool/dbconfig/20210614-142014-root.json
* 14:48 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16509 and previous config saved to /var/cache/conftool/dbconfig/20210614-141720-root.json
* 14:47 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16508 and previous config saved to /var/cache/conftool/dbconfig/20210614-141707-root.json
* 14:28 moritzm: installing ldap-replica1003/1004
* 14:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice<nowiki>{</nowiki>BannerHistory,Impression<nowiki>}</nowiki> to EventGate on testwiki - [[phab:T271168|T271168]] (duration: 00m 57s)
* 14:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2007.codfw.wmnet
* 14:03 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2006.codfw.wmnet
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15537 and previous config saved to /var/cache/conftool/dbconfig/20210426-133922-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16507 and previous config saved to /var/cache/conftool/dbconfig/20210614-140529-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15536 and previous config saved to /var/cache/conftool/dbconfig/20210426-133905-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16506 and previous config saved to /var/cache/conftool/dbconfig/20210614-140511-root.json
* 13:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16505 and previous config saved to /var/cache/conftool/dbconfig/20210614-140217-root.json
* 13:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16504 and previous config saved to /var/cache/conftool/dbconfig/20210614-140203-root.json
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15535 and previous config saved to /var/cache/conftool/dbconfig/20210426-132533-root.json
* 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2006.codfw.wmnet
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15534 and previous config saved to /var/cache/conftool/dbconfig/20210426-132417-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16503 and previous config saved to /var/cache/conftool/dbconfig/20210614-135456-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15533 and previous config saved to /var/cache/conftool/dbconfig/20210426-132402-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16502 and previous config saved to /var/cache/conftool/dbconfig/20210614-135025-root.json
* 13:14 moritzm: installing ldap-replica2005/2006
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16501 and previous config saved to /var/cache/conftool/dbconfig/20210614-135007-root.json
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15532 and previous config saved to /var/cache/conftool/dbconfig/20210426-131029-root.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16500 and previous config saved to /var/cache/conftool/dbconfig/20210614-134713-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15531 and previous config saved to /var/cache/conftool/dbconfig/20210426-130913-root.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16499 and previous config saved to /var/cache/conftool/dbconfig/20210614-134700-root.json
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15530 and previous config saved to /var/cache/conftool/dbconfig/20210426-130858-root.json
* 13:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 12:57 moritzm: installing gst-plugins-base1.0 security updates
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16498 and previous config saved to /var/cache/conftool/dbconfig/20210614-133953-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15529 and previous config saved to /var/cache/conftool/dbconfig/20210426-125526-root.json
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16497 and previous config saved to /var/cache/conftool/dbconfig/20210614-133801-marostegui.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15528 and previous config saved to /var/cache/conftool/dbconfig/20210426-125409-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16496 and previous config saved to /var/cache/conftool/dbconfig/20210614-133503-root.json
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15527 and previous config saved to /var/cache/conftool/dbconfig/20210426-125354-root.json
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16495 and previous config saved to /var/cache/conftool/dbconfig/20210614-133442-root.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15526 and previous config saved to /var/cache/conftool/dbconfig/20210426-124141-marostegui.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16494 and previous config saved to /var/cache/conftool/dbconfig/20210614-133210-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15525 and previous config saved to /var/cache/conftool/dbconfig/20210426-124022-root.json
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16493 and previous config saved to /var/cache/conftool/dbconfig/20210614-133156-root.json
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15524 and previous config saved to /var/cache/conftool/dbconfig/20210426-123020-marostegui.json
* 13:29 effie: restart memcached on codfw
* 12:28 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16492 and previous config saved to /var/cache/conftool/dbconfig/20210614-132449-root.json
* 12:27 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312 db1170:3317 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16491 and previous config saved to /var/cache/conftool/dbconfig/20210614-132235-marostegui.json
* 12:24 Amir1: cleaning watchlist of QuickStatementsBot in wikidatawiki
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16490 and previous config saved to /var/cache/conftool/dbconfig/20210614-132000-root.json
* 12:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16489 and previous config saved to /var/cache/conftool/dbconfig/20210614-131938-root.json
* 12:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16488 and previous config saved to /var/cache/conftool/dbconfig/20210614-130946-root.json
* 12:00 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Enable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16487 and previous config saved to /var/cache/conftool/dbconfig/20210614-130723-marostegui.json
* 11:57 marostegui: Restart es4 primary master - [[phab:T279281|T279281]]
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16486 and previous config saved to /var/cache/conftool/dbconfig/20210614-130547-root.json
* 11:55 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Disable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16485 and previous config saved to /var/cache/conftool/dbconfig/20210614-130435-root.json
* 11:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16484 and previous config saved to /var/cache/conftool/dbconfig/20210614-125442-root.json
* 11:49 hashar@deploy1002: Finished deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki (duration: 00m 13s)
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16483 and previous config saved to /var/cache/conftool/dbconfig/20210614-125043-root.json
* 11:49 hashar@deploy1002: Started deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16482 and previous config saved to /var/cache/conftool/dbconfig/20210614-124931-root.json
* 11:46 Urbanecm: EU B&C done
* 12:37 XioNoX: configure OSPF link-protection on cr3/4-ulsfo - [[phab:T167306|T167306]]
* 11:45 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/Dialog.js: {{Gerrit|a347517f906b07b2503ae559c6cc714e1c50e4aa}}: Fix suggested values not being shown when the params type isnt specified ([[phab:T280688|T280688]]) (duration: 00m 57s)
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16481 and previous config saved to /var/cache/conftool/dbconfig/20210614-123539-root.json
* 11:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:681137{{!}}Revert "Set wgPageImagesAPIDefaultLicense to 'any' for wikidata"]] (duration: 00m 57s)
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16480 and previous config saved to /var/cache/conftool/dbconfig/20210614-123512-marostegui.json
* 11:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16479 and previous config saved to /var/cache/conftool/dbconfig/20210614-123427-root.json
* 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2b5b640ad28bce1df20c2ca82654996d9cfc7630}}: Enable ContentTranslation as a default tool for 11 Wikipedias ([[phab:T279422|T279422]]) (duration: 00m 57s)
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore es1028 original weight', diff saved to https://phabricator.wikimedia.org/P16478 and previous config saved to /var/cache/conftool/dbconfig/20210614-122322-marostegui.json
* 10:58 effie: restarting php-fpm in mw* clusters in codfw to pick up php7.2 update
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to es1028 while es1034 gets upgraded', diff saved to https://phabricator.wikimedia.org/P16477 and previous config saved to /var/cache/conftool/dbconfig/20210614-122242-marostegui.json
* 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 12:22 dcausse: re-pooling wdqs1012
* 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16476 and previous config saved to /var/cache/conftool/dbconfig/20210614-122212-marostegui.json
* 10:38 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1004.wikimedia.org
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16475 and previous config saved to /var/cache/conftool/dbconfig/20210614-122036-root.json
* 10:37 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Setup wmgUseFooterCodeOfConductLink for later usage (duration: 00m 57s)
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2005.codfw.wmnet
* 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 12:17 XioNoX: configure OSPF link-protection on cr3-ulsfo:xe-0/1/1 - [[phab:T167306|T167306]]
* 10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2005.codfw.wmnet
* 10:26 effie: upgrading mw* servers  php7.2  in codfw
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P16474 and previous config saved to /var/cache/conftool/dbconfig/20210614-121101-marostegui.json
* 10:25 marostegui: Deploy schema change on s4 codfw, lag will appear [[phab:T276292|T276292]]
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16473 and previous config saved to /var/cache/conftool/dbconfig/20210614-121031-marostegui.json
* 10:24 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgUseFooterTechCodeOfConductLink instead of wmgUseFooterCodeOfConductLink (duration: 00m 57s)
* 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2004.codfw.wmnet
* 10:24 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1004.wikimedia.org
* 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2004.codfw.wmnet
* 10:22 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseFooterTechCodeOfConductLink (duration: 00m 59s)
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16472 and previous config saved to /var/cache/conftool/dbconfig/20210614-120112-marostegui.json
* 10:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1003.wikimedia.org
* 11:28 effie: restart memcached on mc2019
* 10:18 moritzm: installing systemd updates from buster 10.9 point release
* 11:09 effie: restart memcached on codfw memcached gutter pool (mc-gp2* hosts)
* 10:07 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1003.wikimedia.org
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2003.codfw.wmnet
* 10:00 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 10:52 topranks: [[phab:T283163|T283163]]: Adding "metric-out minimum-igp" to all internal/Confed BGP groups on CR routers.
* 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2006.wikimedia.org
* 10:46 effie: enable puppet on mc*
* 09:42 moritzm: installing clamav security updates on otrs1001
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2003.codfw.wmnet
* 09:38 godog: reboot ms-be1062, kernel backtrace saved
* 10:39 effie: disable puppet on mc* hosts
* 09:26 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2001.codfw.wmnet
* 09:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2006.wikimedia.org
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2001.codfw.wmnet
* 09:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2005.wikimedia.org
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16471 and previous config saved to /var/cache/conftool/dbconfig/20210614-101839-root.json
* 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16469 and previous config saved to /var/cache/conftool/dbconfig/20210614-100336-root.json
* 09:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 09:56 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 (duration: 02m 37s)
* 09:13 jayme: imported etcd-mirror_0.0.6-2 to buster-wikimedia
* 09:54 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4
* 09:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005.wikimedia.org
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16467 and previous config saved to /var/cache/conftool/dbconfig/20210614-094832-root.json
* 09:07 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2005failoid1002.wikimedia.org
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16466 and previous config saved to /var/cache/conftool/dbconfig/20210614-093329-root.json
* 09:04 jayme: imported etcd-mirror_0.0.6-1 to buster-wikimedia
* 09:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 08:55 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005failoid1002.wikimedia.org
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P16465 and previous config saved to /var/cache/conftool/dbconfig/20210614-092234-marostegui.json
* 08:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|f01a6dab70f74938dd51668809a181a8f551b6c8}}: GrowthExperiments: Enable community configuration on testwiki ([[phab:T274520|T274520]]) (duration: 00m 57s)
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16464 and previous config saved to /var/cache/conftool/dbconfig/20210614-092125-root.json
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|88da8226823e59d1d19db9aeca3b5a5140c0c60c}}: GrowthExperiments: Do not enable community configuration outside of beta wikis ([[phab:T274520|T274520]]) (duration: 00m 59s)
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16463 and previous config saved to /var/cache/conftool/dbconfig/20210614-090622-root.json
* 08:28 moritzm: update debmonitor to 0.2.9 on remaining hosts [[phab:T281090|T281090]]
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16462 and previous config saved to /var/cache/conftool/dbconfig/20210614-085118-root.json
* 08:13 moritzm: installing lxml security updates on stretch
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16461 and previous config saved to /var/cache/conftool/dbconfig/20210614-083614-root.json
* 07:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P16460 and previous config saved to /var/cache/conftool/dbconfig/20210614-081239-marostegui.json
* 07:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16459 and previous config saved to /var/cache/conftool/dbconfig/20210614-081031-root.json
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P16458 and previous config saved to /var/cache/conftool/dbconfig/20210614-080552-marostegui.json
* 07:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16456 and previous config saved to /var/cache/conftool/dbconfig/20210614-075528-root.json
* 07:32 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 07:51 marostegui: Depool clouddb1013 to upgrade mysql
* 07:24 moritzm: installing pear security updates
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16455 and previous config saved to /var/cache/conftool/dbconfig/20210614-074024-root.json
* 07:09 moritzm: removed rawdog from bullseye-wikimedia, needs Py2 [[phab:T280989|T280989]]
* 07:30 marostegui: Reboot db2148 [[phab:T284852|T284852]]
* 06:24 elukey: reboot an-coord1001 to pick up kernel security settings (after reimage)
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 [[phab:T284852|T284852]]', diff saved to https://phabricator.wikimedia.org/P16454 and previous config saved to /var/cache/conftool/dbconfig/20210614-072930-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16453 and previous config saved to /var/cache/conftool/dbconfig/20210614-072520-root.json
* 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P16452 and previous config saved to /var/cache/conftool/dbconfig/20210614-071839-marostegui.json
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16451 and previous config saved to /var/cache/conftool/dbconfig/20210614-071742-root.json
* 03:43 kart_: Updated cxserver to 2021-04-21-044024-production ([[phab:T279045|T279045]])
* 07:15 dcausse: restart blazegraph and depool wdqs1012
* 03:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16450 and previous config saved to /var/cache/conftool/dbconfig/20210614-070238-root.json
* 03:37 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:01 moritzm: restarting mw canaries to pick up libwebp security updates
* 03:32 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16449 and previous config saved to /var/cache/conftool/dbconfig/20210614-064734-root.json
* 06:39 moritzm: installing libwep security updates on buster
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16448 and previous config saved to /var/cache/conftool/dbconfig/20210614-063231-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P16447 and previous config saved to /var/cache/conftool/dbconfig/20210614-062554-marostegui.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16446 and previous config saved to /var/cache/conftool/dbconfig/20210614-061226-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16445 and previous config saved to /var/cache/conftool/dbconfig/20210614-060119-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16444 and previous config saved to /var/cache/conftool/dbconfig/20210614-055723-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16443 and previous config saved to /var/cache/conftool/dbconfig/20210614-054615-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16442 and previous config saved to /var/cache/conftool/dbconfig/20210614-054219-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16441 and previous config saved to /var/cache/conftool/dbconfig/20210614-053112-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16440 and previous config saved to /var/cache/conftool/dbconfig/20210614-052715-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P16439 and previous config saved to /var/cache/conftool/dbconfig/20210614-051930-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16438 and previous config saved to /var/cache/conftool/dbconfig/20210614-051608-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P16437 and previous config saved to /var/cache/conftool/dbconfig/20210614-051522-marostegui.json


== 2021-04-25 ==
== 2021-06-12 ==
* 15:23 Amir1: sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) ([[phab:T281066|T281066]])
* 13:49 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused
* 13:49 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused


== 2021-04-24 ==
== 2021-06-11 ==
* 22:24 bstorm: Rebooting labstore1007 from ilo after crash
* 23:37 mutante: removing firewall hole for mgmt networks to install* because it turned out it cant be used for firmware upgrades
* 22:08 brennen: gitlab.wikimedia.org currently up with recommended config applied; test data deleted; users can register but not create projects. brennen, dancy, and thcipriani currently marked as admins. may need to reset data again, but hopefully not.
* 21:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
* 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
* 21:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
* 20:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
* 20:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
* 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
* 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
* 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
* 16:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
* 15:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MediaSearch/extension.json: Make MediaSearch default search experience for all users (duration: 00m 57s)
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16432 and previous config saved to /var/cache/conftool/dbconfig/20210611-150018-root.json
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16431 and previous config saved to /var/cache/conftool/dbconfig/20210611-144514-root.json
* 14:44 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 05s)
* 14:44 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
* 14:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 05s)
* 14:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
* 14:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 14:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 14:35 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 14:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:34 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:34 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 14:33 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:33 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:32 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:31 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16430 and previous config saved to /var/cache/conftool/dbconfig/20210611-143010-root.json
* 14:22 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:22 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:20 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:20 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16429 and previous config saved to /var/cache/conftool/dbconfig/20210611-141506-root.json
* 13:53 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 13:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 13:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16428 and previous config saved to /var/cache/conftool/dbconfig/20210611-135248-marostegui.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1153', diff saved to https://phabricator.wikimedia.org/P16427 and previous config saved to /var/cache/conftool/dbconfig/20210611-135036-marostegui.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1153 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16426 and previous config saved to /var/cache/conftool/dbconfig/20210611-133527-marostegui.json
* 10:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 07:29 moritzm: restarting archiva to pick up OpenJDK security updates
* 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
* 07:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
* 06:56 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:56 elukey: rm -rf empty dir /etc/apache2/sites-enabled/.links2 on webperf1001 to avoid puppet changes at every run
* 05:47 elukey: run systemctl reset-failed ifup@en5.service on doh1001 - [[phab:T273026|T273026]]
* 01:10 eileen: process-control config revision is {{Gerrit|2aed6ff89b}}


== 2021-04-23 ==
== 2021-06-10 ==
* 21:36 foks: removing 1 file for legal compliance
* 23:29 derick@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Citoid/modules/ve/ve.ui.CitoidInspector.js: Backport: [[gerrit:699288{{!}}CitoidInspector: rename getParameterNames to getOrderedParameterNames (T284786)]] (duration: 00m 57s)
* 20:15 mutante: [apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb ([[phab:T280989|T280989]])
* 21:40 urbanecm: End of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # [[phab:T282699|T282699]]
* 19:41 mutante: [apt1001:~] $ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy - copy envoy package from buster to bullseye [[phab:T280989|T280989]]
* 21:36 urbanecm: Start of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # [[phab:T282699|T282699]]
* 19:09 ebernhardson: closing duplicate/wrong cluster indices in cloudelastic
* 21:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki discussiontools # [[phab:T282699|T282699]]
* 17:02 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 20:13 mutante: installed tftp client on install1003 for debugging
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:00 jhuneidi@deploy1002: Pruned MediaWiki: 1.37.0-wmf.5 (duration: 03m 33s)
* 16:32 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:31 ryankemper: [[phab:T265547|T265547]] Cleanup following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/698025: `sudo -E cumin -b 5 'P:analytics::cluster::elasticsearch' 'sudo rm -rfv /etc/mjolnir /srv/deployment/search/mjolnir'`
* 16:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikimediaMaintenance/dumpInterwiki.php: {{Gerrit|b21904e326e917f5ac6d7129a4d224380c6e4c21}}: Remove sep11 interwiki link from dumpinterwiki.php (duration: 01m 08s)
* 14:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 23s)
* 14:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 18:39 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 03s)
* 14:25 moritzm: revert back bullseye image to daily build from last week (to rule out potential reimage issue)
* 18:38 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: {{Gerrit|8aeab139879613782548b20fc11af5e66589e30a}}: Fire language change hook ([[phab:T280770|T280770]]) (duration: 01m 07s)
* 13:33 elukey: roll restart of all thanos-swift proxies to pick up new ML account - [[phab:T280773|T280773]]
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|d26968c1c3b3f3e115ff37a9a138d225cabba25a}}: wgWelcomeSurveyExperimentalGroups: Use new syntax in CS.php ([[phab:T284597|T284597]]; [[phab:T284735|T284735]]) (duration: 01m 08s)
* 12:50 jbond42: upload new debmonitor-client packages
* 17:11 moritzm: updating bullseye installer image to latest daily image (kernel ABI changed again) [[phab:T275873|T275873]]
* 11:50 moritzm: installing perf updates from Buster 10.9 point release
* 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:06 moritzm: installing Linux 4.19.181 updates from Buster 10.9 point release (no reboots, just updating the packages)
* 17:06 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:54 moritzm: installing xen security updates
* 17:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:49 moritzm: installing xorg-server security updates
* 16:53 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15512 and previous config saved to /var/cache/conftool/dbconfig/20210423-093723-root.json
* 16:51 moritzm: installing rails security updates
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15511 and previous config saved to /var/cache/conftool/dbconfig/20210423-092220-root.json
* 16:37 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: no-op for Beta {{Gerrit|I2a42c222003}} (duration: 01m 07s)
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15510 and previous config saved to /var/cache/conftool/dbconfig/20210423-090716-root.json
* 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15509 and previous config saved to /var/cache/conftool/dbconfig/20210423-085212-root.json
* 16:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 08:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
* 16:24 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 08:21 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
* 15:09 papaul: power down ms-be2038 for BBU replacement
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16417 and previous config saved to /var/cache/conftool/dbconfig/20210610-123201-root.json
* 08:13 moritzm: upgrading d-i image for bullseye to RC1 release [[phab:T275873|T275873]]
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16416 and previous config saved to /var/cache/conftool/dbconfig/20210610-121657-root.json
* 08:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16415 and previous config saved to /var/cache/conftool/dbconfig/20210610-120153-root.json
* 08:12 moritzm: upgrading d-i image for bullseye to RC1 release
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16414 and previous config saved to /var/cache/conftool/dbconfig/20210610-114650-root.json
* 08:12 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1019.eqiad.wmnet
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16413 and previous config saved to /var/cache/conftool/dbconfig/20210610-113146-root.json
* 07:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1019.eqiad.wmnet
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16412 and previous config saved to /var/cache/conftool/dbconfig/20210610-111643-root.json
* 07:56 jynus: deleting db1156 s2 database and reloading it from logical backups [[phab:T280492|T280492]]
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16411 and previous config saved to /var/cache/conftool/dbconfig/20210610-110139-root.json
* 07:22 Amir1: removing junk bounced email addresses from yahoo from all mailing lists
* 11:00 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 53s)
* 05:40 marostegui: Stop db1079 to clone db1158 (lag will appear on s7 on wiki replicas)
* 10:59 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to clone db1158 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15506 and previous config saved to /var/cache/conftool/dbconfig/20210423-053907-marostegui.json
* 10:47 topranks: [[phab:T283163|T283163]]: Adding "metric-out minimum-igp" to BGP group Confed_eqord on eqiad, codfw and eqdfw CRs.
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16410 and previous config saved to /var/cache/conftool/dbconfig/20210610-104635-root.json
* 10:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikiEditor/modules/jquery.wikiEditor.js: {{Gerrit|8a17c43c5470b84ba58239bb2cf947dbebf1979f}}: Fix call to renamed var ([[phab:T284716|T284716]]) (duration: 01m 25s)
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16409 and previous config saved to /var/cache/conftool/dbconfig/20210610-103132-root.json
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16408 and previous config saved to /var/cache/conftool/dbconfig/20210610-103032-marostegui.json
* 10:29 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:28 kormat: running optimize tables against pc1009 (pc3) [[phab:T282761|T282761]]
* 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:21 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16407 and previous config saved to /var/cache/conftool/dbconfig/20210610-101858-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16406 and previous config saved to /var/cache/conftool/dbconfig/20210610-100355-root.json
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16405 and previous config saved to /var/cache/conftool/dbconfig/20210610-094851-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16404 and previous config saved to /var/cache/conftool/dbconfig/20210610-093346-root.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16402 and previous config saved to /var/cache/conftool/dbconfig/20210610-093003-marostegui.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16401 and previous config saved to /var/cache/conftool/dbconfig/20210610-092246-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16399 and previous config saved to /var/cache/conftool/dbconfig/20210610-091842-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16398 and previous config saved to /var/cache/conftool/dbconfig/20210610-090345-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 30%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16397 and previous config saved to /var/cache/conftool/dbconfig/20210610-090339-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16396 and previous config saved to /var/cache/conftool/dbconfig/20210610-084841-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16395 and previous config saved to /var/cache/conftool/dbconfig/20210610-084835-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16394 and previous config saved to /var/cache/conftool/dbconfig/20210610-083338-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16393 and previous config saved to /var/cache/conftool/dbconfig/20210610-083332-root.json
* 08:25 volans: uploaded spicerack_0.0.53 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16392 and previous config saved to /var/cache/conftool/dbconfig/20210610-081834-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16391 and previous config saved to /var/cache/conftool/dbconfig/20210610-081828-root.json
* 08:17 marostegui: Drop several grants from labswiki (wikitech) [[phab:T282074|T282074]]
* 07:57 jynus: reset-failed on cumin1001 after backup rerun
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P16389 and previous config saved to /var/cache/conftool/dbconfig/20210610-075702-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16388 and previous config saved to /var/cache/conftool/dbconfig/20210610-075247-marostegui.json
* 07:44 jynus: retrying s6 snapshots on eqiad, acking demon failure
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16387 and previous config saved to /var/cache/conftool/dbconfig/20210610-073727-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16386 and previous config saved to /var/cache/conftool/dbconfig/20210610-072224-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16385 and previous config saved to /var/cache/conftool/dbconfig/20210610-070720-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16384 and previous config saved to /var/cache/conftool/dbconfig/20210610-065217-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16383 and previous config saved to /var/cache/conftool/dbconfig/20210610-064916-root.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16382 and previous config saved to /var/cache/conftool/dbconfig/20210610-063745-marostegui.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16381 and previous config saved to /var/cache/conftool/dbconfig/20210610-063412-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16380 and previous config saved to /var/cache/conftool/dbconfig/20210610-061909-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16379 and previous config saved to /var/cache/conftool/dbconfig/20210610-061806-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16378 and previous config saved to /var/cache/conftool/dbconfig/20210610-060405-root.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16377 and previous config saved to /var/cache/conftool/dbconfig/20210610-060302-root.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16376 and previous config saved to /var/cache/conftool/dbconfig/20210610-055327-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16375 and previous config saved to /var/cache/conftool/dbconfig/20210610-055037-root.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16374 and previous config saved to /var/cache/conftool/dbconfig/20210610-054802-root.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16373 and previous config saved to /var/cache/conftool/dbconfig/20210610-054759-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16372 and previous config saved to /var/cache/conftool/dbconfig/20210610-053534-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16371 and previous config saved to /var/cache/conftool/dbconfig/20210610-053259-root.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16370 and previous config saved to /var/cache/conftool/dbconfig/20210610-053255-root.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16369 and previous config saved to /var/cache/conftool/dbconfig/20210610-052421-marostegui.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16368 and previous config saved to /var/cache/conftool/dbconfig/20210610-052030-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16367 and previous config saved to /var/cache/conftool/dbconfig/20210610-052017-marostegui.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16366 and previous config saved to /var/cache/conftool/dbconfig/20210610-050526-root.json


== 2021-04-22 ==
== 2021-06-09 ==
* 17:26 marostegui: Stop mysql on tendril/dbtree database
* 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1002.wikimedia.org
* 16:33 volker-e@deploy1002: Finished deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455) (duration: 00m 06s)
* 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
* 16:32 volker-e@deploy1002: Started deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455)
* 21:59 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host doh1002.wikimedia.org
* 13:23 marostegui: Tendril and dbtree are up but on a degraded status (slow reponse)
* 21:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
* 13:19 marostegui: Tendril and dbtree are down at the moment
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1001.wikimedia.org
* 12:46 Urbanecm: Start server-side upload for 2 video files ([[phab:T280763|T280763]], [[phab:T280524|T280524]])
* 21:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1001.wikimedia.org
* 12:31 marostegui: Restart mysql on db1115 (tendril/dbtree will fail)
* 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/DiscussionTools/modules/dt-ve/CommentTargetWidget.less: Backport: [[gerrit:698681{{!}}Update surface styles for VE changes (T284567)]] (duration: 01m 14s)
* 04:55 eileen: civicrm revision changed from {{Gerrit|42ca3cf65a}} to {{Gerrit|33a63d5789}}, config revision is {{Gerrit|cf07e7ba0b}}
* 21:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/includes/language/LanguageConverter.php: Backport: [[gerrit:699014{{!}}Revert "Add type hint to constructor of LanguageConverter" (T284685)]] (duration: 01m 24s)
* 02:47 krinkle@deploy1002: Finished deploy [integration/docroot@010e445]: (no justification provided) (duration: 00m 09s)
* 21:08 mutante: rsyncing static-bugzilla HTML from miscweb1002 to deploy1002
* 02:47 krinkle@deploy1002: Started deploy [integration/docroot@010e445]: (no justification provided)
* 21:00 mutante: deploy1002 - creating temp dir /srv/miscweb to rsync static-bugzilla data to, coming from miscweb1002 [[phab:T281538|T281538]]
* 01:34 eileen: civicrm revision changed from {{Gerrit|35a8dd33ba}} to {{Gerrit|42ca3cf65a}}, config revision is {{Gerrit|cf07e7ba0b}}
* 20:36 mutante: deployed temp ferm change on deployment servers to let miscweb dump data, puppetized. scap pull from mwdebug1001 works, deployment good to go
* 00:28 legoktm: legoktm@deneb:/var/cache/pbuilder/aptcache$ sudo rm -rf * # Cleaned up 8GB more
* 19:08 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]] (duration: 01m 07s)
* 00:27 legoktm: legoktm@deneb:/var/cache/apt/archives$ sudo rm -rf * # cleaned up 6GB
* 19:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 00:03 legoktm: subscribed all list admins to the listadmins@ mailing list ([[phab:T280716|T280716]])
* 18:07 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php (foreachwiki)
* 17:52 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php --wiki rmywiki
* 17:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudmetrics1002.eqiad.wmnet
* 17:32 aborrero@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudmetrics1002.eqiad.wmnet
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 17:16 jayme: updated python3-docker-report to 0.0.12 on chartmuseum2001.codfw.wmnet,chartmuseum1001.eqiad.wmnet,deneb.codfw.wmnet,registry[2003-2008].codfw.wmnet,registry[1003-1004].eqiad.wmnet
* 16:35 jayme: import docker-report 0.0.12 into buster-wikimedia
* 15:37 hnowlan: rebuilding maps2009 as buster master
* 15:08 vgutierrez: restarting acme-chief on acmechief1001
* 15:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 15:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 15:01 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 55s)
* 15:00 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:57 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 04s)
* 14:57 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:51 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 15s)
* 14:50 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:45 moritzm: installing postgresql 9.6 security updates on stretch
* 14:37 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on all wikis - [[phab:T282562|T282562]] (duration: 01m 06s)
* 14:33 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on all wikis - [[phab:T282855|T282855]] (duration: 01m 06s)
* 14:23 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on testwiki - [[phab:T282855|T282855]] (duration: 01m 07s)
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16358 and previous config saved to /var/cache/conftool/dbconfig/20210609-141807-root.json
* 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=0; selector: name=maps2009.codfw.wmnet
* 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:59 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on testwiki - [[phab:T282562|T282562]] (duration: 01m 08s)
* 13:56 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki1001 - [[phab:T282469|T282469]]
* 13:54 XioNoX: Add Routinator 3000 0.9.0 to the APT repo - [[phab:T282469|T282469]]
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16356 and previous config saved to /var/cache/conftool/dbconfig/20210609-134800-root.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16355 and previous config saved to /var/cache/conftool/dbconfig/20210609-133257-root.json
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16354 and previous config saved to /var/cache/conftool/dbconfig/20210609-132958-marostegui.json
* 13:12 moritzm: installing nginx security updates
* 13:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 02m 26s)
* 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 00m 10s)
* 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 01m 14s)
* 13:05 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16351 and previous config saved to /var/cache/conftool/dbconfig/20210609-130114-root.json
* 12:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 12:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1 (duration: 00m 53s)
* 12:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16350 and previous config saved to /var/cache/conftool/dbconfig/20210609-124610-root.json
* 12:43 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 28s)
* 12:42 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:42 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 08s)
* 12:41 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:41 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 47s)
* 12:40 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:39 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 41s)
* 12:39 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16349 and previous config saved to /var/cache/conftool/dbconfig/20210609-123615-root.json
* 12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 12:33 godog: lists1001:rm /var/lib/prometheus/node.d/mailman_queues.prom
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16348 and previous config saved to /var/cache/conftool/dbconfig/20210609-123106-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16347 and previous config saved to /var/cache/conftool/dbconfig/20210609-122111-root.json
* 12:18 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 03m 38s)
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16345 and previous config saved to /var/cache/conftool/dbconfig/20210609-121603-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16344 and previous config saved to /var/cache/conftool/dbconfig/20210609-121501-marostegui.json
* 12:14 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:13 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 53s)
* 12:12 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 44s)
* 12:09 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 12:09 hnowlan: running `nodetool decommission` on maps2009
* 12:06 hnowlan: stopped tilerator on maps2009
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16343 and previous config saved to /var/cache/conftool/dbconfig/20210609-120608-root.json
* 12:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
* 12:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
* 12:04 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
* 12:03 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 06s)
* 12:03 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 12:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac43baa}}: {{Gerrit|d185728}}: WelcomeSurveyExperimentalGroups: Use new syntax ([[phab:T284599|T284599]]) (duration: 01m 19s)
* 11:59 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 54s)
* 11:58 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:54 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 41s)
* 11:54 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:53 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 03m 11s)
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16342 and previous config saved to /var/cache/conftool/dbconfig/20210609-115104-root.json
* 11:50 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:49 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 02m 16s)
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16341 and previous config saved to /var/cache/conftool/dbconfig/20210609-114944-marostegui.json
* 11:47 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 05s)
* 11:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:46 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 53s)
* 11:45 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: redeploy HEAD~1 (duration: 01m 55s)
* 11:38 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: redeploy HEAD~1
* 11:36 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1 (duration: 00m 54s)
* 11:35 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1
* 11:34 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 02m 23s)
* 11:32 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
* 11:32 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 00m 59s)
* 11:31 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
* 11:27 jbond: drop keep_env from sudo config - #[[phab:T275852|T275852]]
* 11:22 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 43s)
* 11:22 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 11:21 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 15s)
* 11:20 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 11:11 awight: EU deployment window complete
* 11:10 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698855{{!}}Set wgAutoConfirmCount to 10 for enwikisource (T284627)]] (duration: 02m 04s)
* 10:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
* 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 53s)
* 10:14 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 05m 41s)
* 10:07 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:06 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 38s)
* 10:06 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 [[phab:T283235|T283235]]', diff saved to https://phabricator.wikimedia.org/P16337 and previous config saved to /var/cache/conftool/dbconfig/20210609-100423-marostegui.json
* 10:00 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 48s)
* 09:59 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 09:58 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on schema* after switch towards nginx-light [[phab:T164456|T164456]]
* 07:54 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:16 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:26 XioNoX: Add 185.71.138.0/24 to network::external and diffscan - [[phab:T252132|T252132]]
* 06:12 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16334 and previous config saved to /var/cache/conftool/dbconfig/20210609-053213-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16333 and previous config saved to /var/cache/conftool/dbconfig/20210609-051710-root.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16332 and previous config saved to /var/cache/conftool/dbconfig/20210609-050206-root.json
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16331 and previous config saved to /var/cache/conftool/dbconfig/20210609-044703-root.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 to remove rev_page_id index [[phab:T163532|T163532]]', diff saved to https://phabricator.wikimedia.org/P16330 and previous config saved to /var/cache/conftool/dbconfig/20210609-044428-marostegui.json
* 04:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 03:30 eileen: civicrm revision changed from {{Gerrit|eac772e9c9}} to {{Gerrit|31d07115a0}}, config revision is {{Gerrit|931a941a5e}}
* 03:01 Amir1: mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary  ([[phab:T284444|T284444]])
* 02:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:56 Amir1: clean up of the rest of mbox files (except arbcom) ([[phab:T282303|T282303]])
* 02:55 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 02:49 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "xfer categories following reimage" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 02:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:39 ryankemper: [[phab:T280382|T280382]] Re-enabled puppet on `wdqs1010`
* 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 00:37 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698654{{!}}Enable Wikisource OCR on select Wikisources (T283898)]] (duration: 01m 31s)
* 00:00 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 00:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2021-04-21 ==
== 2021-06-08 ==
* 23:58 eileen: tools revision changed from {{Gerrit|3d950fffbd}} to {{Gerrit|c26a8c0cb6}}
* 22:36 krinkle@deploy1002: Finished deploy [integration/docroot@d4c9e08]: (no justification provided) (duration: 00m 08s)
* 23:49 legoktm: made myself and Amir1 list admins for the listadmins@lists.wikimedia.org mailing list
* 22:36 krinkle@deploy1002: Started deploy [integration/docroot@d4c9e08]: (no justification provided)
* 20:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1017.eqiad.wmnet
* 22:21 ryankemper: [[phab:T284479|T284479]] Block put back in place. We're back to expected traffic levels. We'll need a more granular mitigation in place before we can lift this block going forward.
* 20:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1017.eqiad.wmnet
* 22:15 ryankemper: [[phab:T284479|T284479]] Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 19 'A:cp-text' 'run-puppet-agent -q'`
* 20:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1016.eqiad.wmnet
* 22:14 ryankemper: [[phab:T284479|T284479]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698850, running puppet on `cp3052.esams.wmnet`
* 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.eqiad.wmnet
* 22:10 ryankemper: [[phab:T284479|T284479]] Yup more than enough evidence of a strong upward spike now. Proceeding to revert
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host planet1003.eqiad.wmnet
* 22:10 ryankemper: [[phab:T284479|T284479]] Already starting to see a large upward spike in requests. Doing a quick sanity check to make sure this is out of the ordinary but I'll likely be putting the block back in place shortly
* 19:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:09 ryankemper: [[phab:T284479|T284479]] Puppet run complete across all of `cp-text`. Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-1h&to=now over the next few minutes to see if we see a large spike in `full_text` and `entity_full_text` queries
* 19:48 robh@cumin1001: START - Cookbook sre.dns.netbox
* 22:03 ryankemper: [[phab:T284479|T284479]] Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 15 'A:cp-text' 'run-puppet-agent -q'`
* 19:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:01 ryankemper: [[phab:T284479|T284479]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698849, running puppet on `cp3052.esams.wmnet`
* 19:46 mutante: creating a ganeti VM to test bullseye install
* 21:59 ryankemper: [[phab:T284479|T284479]] Prior context: We put a block on a range of Google App Engine IPs yesterday to protect Cirrussearch from a bad actor; now we're going to try lifting the block and seeing if we're still getting slammed with traffic
* 19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
* 19:45 bstorm: manually kicking off a run of update-openstack-mirror on sodium to capture an upstream package update
* 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
* 19:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:29 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1009.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 18:46 Urbanecm: Morning B&C done
* 21:27 ryankemper: [[phab:T280382|T280382]] Disabled puppet on `wdqs1010` out of abundance of caution; will re-enable after wdqs1009 is reimaged and xfer back is complete
* 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikibaseMediaInfo/: {{Gerrit|f831d16e42e712832d683233a5b21ad59f7c73b3}}: Make the logistic regression image search default ([[phab:T271799|T271799]]) (duration: 00m 58s)
* 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f6d076a69607172475a86ba935a273e7519108d1}}: Update $wgGEHomepageNewAccountVariants ([[phab:T278123|T278123]]) (duration: 00m 58s)
* 20:38 bblack: authdns1001: update gdnsd to 3.7.0-2~wmf1
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1ae5ca5467fad7bfdae8aa94b241fe6c048ab8e5}}: Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere ([[phab:T279853|T279853]]) (duration: 00m 59s)
* 20:18 bblack: authdns2001: update gdnsd to 3.7.0-2~wmf1
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e252de0482c60e87e06d866006bb9ceb186af6cf}}: eswiki: Push Growth features out of dark mode ([[phab:T278235|T278235]]) (duration: 01m 00s)
* 19:55 bblack: dns[1235]002: update gdnsd to 3.7.0-2~wmf1
* 17:43 jynus: deploy grant changes on m5 backup sources (db1117 and db2078) [[phab:T278614|T278614]]
* 19:53 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 15:54 legoktm: [[phab:T280744|T280744]]: legoktm@lists1001:~$ sudo chmod 644 /etc/aliases
* 19:46 bblack: dns[1235]001: update gdnsd to 3.7.0-2~wmf1
* 15:15 Urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php # [[phab:T279853|T279853]]
* 19:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15503 and previous config saved to /var/cache/conftool/dbconfig/20210421-151526-root.json
* 19:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 15:02 moritzm: installing jquery security updates on buster
* 19:36 ryankemper: [[phab:T280382|T280382]] Cancelling the data-transfer run to restart it; realized that the cookbook will start up the `wdqs-updater` again so will locally hack the cookbook on `cumin1001` to prevent that
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15502 and previous config saved to /var/cache/conftool/dbconfig/20210421-150023-root.json
* 19:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Echo/modules/nojs/mw.echo.alert.monobook.less: Backport: [[gerrit:698848{{!}}Fix MonoBook orange banner hover styles (T284496)]] (duration: 01m 08s)
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15501 and previous config saved to /var/cache/conftool/dbconfig/20210421-144519-root.json
* 19:26 bblack: dns400[12]: update gdnsd to 3.7.0-3~wmf1
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15500 and previous config saved to /var/cache/conftool/dbconfig/20210421-143015-root.json
* 19:25 bblack: apt: update gdnsd package to gdnsd-3.7.0-2~wmf1 (fix systemd reload issues)
* 14:25 jbond42: upload new version of debmonitor-client to apt
* 19:20 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 13:54 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=fawiki # [[phab:T279853|T279853]]
* 19:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:39 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 19:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 13:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiki # [[phab:T279853|T279853]]
* 19:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:01 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 19:18 ryankemper: [[phab:T280382|T280382]] `sudo systemctl stop wdqs-updater wdqs-blazegraph` on `wdqs1010` in preparation for transfer
* 12:21 moritzm: installing failoid2002
* 19:08 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (all caught up on lag)
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 18:47 bblack: dns4001: update gdnsd to 3.7.0-1~wmf1
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 18:43 bblack: apt: update gdnsd package to gdnsd-3.7.0-1~wmf1
* 11:49 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:49 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:46 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 17:36 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:32 awight: EU backport window complete
* 17:25 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:31 moritzm: installing failoid1002
* 17:10 elukey: fix dbstore1007's ip address in analytics-in4 on cr<nowiki>{</nowiki>1,2<nowiki>}</nowiki>-eqiad
* 11:29 awight@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents: Backport: [[gerrit:681334{{!}}Send 0 edits userEditCountBucket for anons (T210106)]] (duration: 00m 59s)
* 17:06 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]] (duration: 34m 12s)
* 10:41 jbond42: switch debmonitor-client to cfssl (second try)
* 16:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 10:37 jbond42: upload golang-cfssl packages for jessi and stretch
* 16:27 papaul: powerdown  moss-fe2002  for relocation
* 10:33 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid1002.eqiad.wmnet
* 16:06 papaul: powerdown  ms-backup2002  for relocation
* 10:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1002.eqiad.wmnet
* 16:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:23 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid1002.eqiad.wmnet
* 15:40 papaul: powerdown ms-be2061 for relocation
* 10:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1002.eqiad.wmnet
* 15:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
* 10:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid2002.codfw.wmnet
* 15:33 papaul: powerdown thanos-fe2003 for relocation
* 10:21 hnowlan: rebooting eventlog1002 for kernel update
* 15:23 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 4/4 (pc1009) ref P16060, [[phab:T280605|T280605]], [[phab:T282761|T282761]].
* 10:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid2002.codfw.wmnet
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 [[phab:T282761|T282761]]
* 09:56 jbond42: switch debmonitor-clients to use cfssl
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 [[phab:T282761|T282761]]
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15496 and previous config saved to /var/cache/conftool/dbconfig/20210421-093109-root.json
* 15:13 papaul: powerdown cp2034 for relocation
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15495 and previous config saved to /var/cache/conftool/dbconfig/20210421-091605-root.json
* 15:04 papaul: powerdown cp2033 for relocation
* 09:08 elukey: upgrade hue on an-tool1009 to 4.9
* 14:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
* 09:05 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 14:43 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on testreduce1001/scandium after switch towards nginx-light  [[phab:T164456|T164456]]
* 09:05 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 14:08 marostegui: Restart sanitarium hosts (db2094, db2095, db1154, db1155) to pick up new filters [[phab:T284106|T284106]]
* 09:03 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet,service=nginx
* 14:05 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc3 master [[phab:T282761|T282761]] (duration: 00m 57s)
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15494 and previous config saved to /var/cache/conftool/dbconfig/20210421-090100-root.json
* 14:05 kormat: setting pc1010 as pc3 primary [[phab:T282761|T282761]]
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
* 13:51 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 42s)
* 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 13:51 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 13:48 otto@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 13:41 otto@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 13:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 47s)
* 08:56 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 13:39 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 08:55 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 13:36 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 01m 03s)
* 08:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
* 13:35 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 08:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
* 13:33 otto@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
* 08:53 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 13:22 otto@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
* 08:52 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 12:15 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master [[phab:T282761|T282761]] (duration: 00m 57s)
* 08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
* 12:14 kormat: setting pc1008 back as pc2 primary [[phab:T282761|T282761]]
* 08:50 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 10s)
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef49422b162ab0161bc39da857b3230175ac4492}}: enwiki: Disable indexing on the Book namespace ([[phab:T283522|T283522]]) (duration: 00m 56s)
* 08:50 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 11:46 urbanecm: Start server-side upload for 1 file ([[phab:T283470|T283470]])
* 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
* 11:45 moritzm: installing nginx security updates on buster
* 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
* 11:43 urbanecm: Start server-side upload for 2 files ([[phab:T283645|T283645]], [[phab:T283583|T283583]])
* 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
* 11:39 urbanecm: EU B&C deployment done
* 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
* 11:38 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16329 and previous config saved to /var/cache/conftool/dbconfig/20210608-113857-kormat.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15493 and previous config saved to /var/cache/conftool/dbconfig/20210421-084555-root.json
* 11:38 moritzm: installing ruby-nokogiri security updates
* 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
* 11:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/WikimediaEvents/: {{Gerrit|b0b46530b731d2a5f17b0aa04a4cf99df175e23d}}: universalLanguageSelector: Add missing properties ([[phab:T280770|T280770]]) (duration: 00m 56s)
* 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
* 11:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: {{Gerrit|5df13eeae3b52b98eaf3fdb99ddfa5a0f7b2b1e4}}: Pass context to compact_language_links.open hook ([[phab:T280770|T280770]]) (duration: 00m 57s)
* 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
* 11:23 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16328 and previous config saved to /var/cache/conftool/dbconfig/20210608-112354-kormat.json
* 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 3/3) (duration: 00m 58s)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
* 11:13 urbanecm@deploy1002: Synchronized wmf-config/config/lvwiki.yaml: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 2/3) (duration: 00m 56s)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
* 11:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 1/3) (duration: 00m 57s)
* 08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
* 11:10 urbanecm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=lvwiki growthexperiments # [[phab:T278191|T278191]]
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
* 11:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16327 and previous config saved to /var/cache/conftool/dbconfig/20210608-110850-kormat.json
* 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abd401074247d1f1dd2722c2d4d06747b066d547}}: enwiki: Deploy Growth freatures to 2% of new accounts ([[phab:T281896|T281896]]) (duration: 00m 57s)
* 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
* 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
* 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
* 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
* 08:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
* 10:53 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16326 and previous config saved to /var/cache/conftool/dbconfig/20210608-105346-kormat.json
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
* 10:50 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 53s)
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
* 10:49 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
* 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
* 10:16 liw: testing upcoming Scap release on beta
* 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
* 10:01 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki2001 - [[phab:T282469|T282469]]
* 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
* 09:58 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 54s)
* 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
* 09:57 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
* 07:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
* 09:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
* 09:04 jayme: removing docker-images from registry: releng/ci-jessie, releng/ci-src-setup, releng/composer-php56, releng/composer-test-php56, releng/npm, releng/npm-test, releng/npm-test-3d2png, releng/npm-test-graphoid, releng/npm-test-librdkafka, releng/npm-test-maps-service, releng/php56, releng/quibble-jessie, releng/quibble-jessie-hhvm, releng/quibble-jessie-php56 - [[phab:T251918|T251918]]
* 07:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
* 08:31 dcausse: depooling wdqs1006 (lag)
* 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
* 08:29 dcausse: restarting blazegraph on wdqs1006
* 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
* 08:13 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
* 07:49 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
* 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
* 07:41 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
* 07:40 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
* 07:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
* 07:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:49 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16324 and previous config saved to /var/cache/conftool/dbconfig/20210608-072937-root.json
* 06:49 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16323 and previous config saved to /var/cache/conftool/dbconfig/20210608-071433-root.json
* 06:42 elukey: upload hue_4.9.0-2+deb10u1 to buster-wikimedia
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16322 and previous config saved to /var/cache/conftool/dbconfig/20210608-065930-root.json
* 06:11 marostegui: Stop MySQL on db1074 to clone db1156 (there will be lag in s2 in wiki replicas) [[phab:T258361|T258361]]
* 06:52 tgr: [[phab:T283606|T283606]]: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=<nowiki>{</nowiki>ar,bn,cs,vi<nowiki>}</nowiki>wiki --verbose --search-index with gerrit:696307 applied
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone db1156 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15491 and previous config saved to /var/cache/conftool/dbconfig/20210421-061019-marostegui.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16321 and previous config saved to /var/cache/conftool/dbconfig/20210608-064426-root.json
* 06:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for upgrade', diff saved to https://phabricator.wikimedia.org/P16320 and previous config saved to /var/cache/conftool/dbconfig/20210608-064055-marostegui.json
* 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
* 06:27 elukey: clean some airflow logs on an-airflow1001 as one off to free space (had a chat with the Search team first)
* 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
* 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1086.eqiad.wmnet
* 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1086.eqiad.wmnet
* 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 00:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 04:54 marostegui: Repool clouddb1019:3314
* 00:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 04:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:15 ryankemper: [WDQS] Pooled `wdqs1003`
* 02:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:14 ryankemper: [WDQS] Pooled `wdqs2008`
* 02:38 ryankemper: [[phab:T284445|T284445]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "repairing overinflated blazegraph journal" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs`
* 00:07 ryankemper: `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1006.eqiad.wmnet`
* 02:37 ryankemper: [[phab:T284445|T284445]] after manually stopping blazegraph/wdqs-updater, `sudo rm -fv /srv/wdqs/wikidata.jnl` on `wdqs1012` (clearing old overinflated journal file away before xferring new one)
* 00:04 ryankemper: [WDQS] pooled `wdqs1004`
* 02:34 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo depool` (catching up on ~7h of lag)


== 2021-04-20 ==
== 2021-06-07 ==
* 23:46 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 57s)
* 21:26 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 23:44 urbanecm@deploy1002: Synchronized wmf-config/config/urwiki.yaml: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 57s)
* 21:12 sbassett: Deployed security patch for [[phab:T284364|T284364]]
* 23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 58s)
* 19:30 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] We'll keep monitoring. For now this incident is resolved. Glancing at our current volume relative to what we'd expect, the numbers we see match what we'd expect. If we're accidentally banning any innocent requests they must be an incredibly small percentage of the total otherwise we'd see significantly lower volume than expected
* 23:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=urwiki GrowthExperiments # [[phab:T280067|T280067]]
* 19:25 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] Seeing the expected drop in `entity_full_text` requests here: https://grafana-rw.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-12h&to=now As a result we're no longer rejecting any requests
* 23:38 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 3/3) (duration: 00m 56s)
* 19:21 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] We're working on rolling out https://gerrit.wikimedia.org/r/698607, which will ban search API requests that match the Google App Engine IP range `2600:1900::0/28` AND whose user agent includes `HeadlessChrome`
* 23:36 urbanecm@deploy1002: Synchronized wmf-config/config/elwiki.yaml: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 2/3) (duration: 00m 57s)
* 19:19 cdanis: [[phab:T284479|T284479]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin -b16 'A:cp-text' "run-puppet-agent"
* 23:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 1/3) (duration: 00m 57s)
* 19:07 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve [[phab:T284462|T284462]] (duration: 04m 53s)
* 23:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 19:02 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve [[phab:T284462|T284462]]
* 23:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=elwiki GrowthExperiments # [[phab:T280172|T280172]]
* 19:01 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve (duration: 02m 01s)
* 23:31 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 3/3) (duration: 00m 57s)
* 18:59 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve
* 23:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist growthexperiments sql.php --cluster=extension1 /srv/mediawiki/php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentee_data.sql # [[phab:T279587|T279587]]
* 18:57 herron: prometheus3001: moved /srv back to vda1 filesystem [[phab:T243057|T243057]]
* 23:28 urbanecm@deploy1002: Synchronized wmf-config/config/cawiki.yaml: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 2/3) (duration: 00m 57s)
* 18:26 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php-1.37.0-wmf.7]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=skwiki --phab=[[phab:T284149|T284149]]
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 1/3) (duration: 00m 57s)
* 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/includes/WelcomeSurvey.php: {{Gerrit|368b5d9}}: {{Gerrit|0e79aee}}: WelcomeSurvey backports ([[phab:T284127|T284127]], [[phab:T284257|T284257]]; 2/2) (duration: 00m 57s)
* 23:24 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=cawiki GrowthExperiments # [[phab:T280673|T280673]]
* 18:22 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/extension.json: {{Gerrit|368b5d9}}: {{Gerrit|0e79aee}}: WelcomeSurvey backports ([[phab:T284127|T284127]], [[phab:T284257|T284257]]; 1/2) (duration: 00m 56s)
* 23:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|7089728}}: {{Gerrit|b2482fb}}: initWikiConfig GE backports ([[phab:T284072|T284072]]) (duration: 00m 58s)
* 23:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 3/3) (duration: 00m 56s)
* 23:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
* 18:14 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 2/3) (duration: 00m 56s)
* 23:03 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
* 18:14 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 22:14 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:14 ottomata: rolling restart of kafka jumbo brokers  - [[phab:T283067|T283067]]
* 22:10 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/config/skwiki.yaml: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 1/3) (duration: 00m 59s)
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 18:12 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 18:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=skwiki growthexperiments # [[phab:T284149|T284149]]
* 20:52 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ruwiki # [[phab:T279853|T279853]]
* 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5de2f8b27b016a2cd8f424d8e40318edde5e5704}}: Set WelcomeSurveyEnableWithHomepage ([[phab:T281896|T281896]], [[phab:T284257|T284257]]) (duration: 00m 59s)
* 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1020.wikimedia.org
* 17:53 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 20:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=viwiki # [[phab:T279853|T279853]]
* 17:53 ottomata: rolling restart of kafka jumbo mirror makers  - [[phab:T283067|T283067]]
* 20:36 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1020.wikimedia.org
* 17:17 ryankemper: [Cirrussearch] We're seeing ~10% of current requests being rejected by poolcounter, due to ~2x expected `eqiad.full_text` query volume and ~30x expected `eqiad.entity_full_text` query volume
* 20:36 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ukwiki # [[phab:T279853|T279853]]
* 16:56 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph locked up)
* 20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1017-1019].wikimedia.org
* 16:51 razzi: run homer '*.eqiad.wmnet' diff
* 20:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=tewiki # [[phab:T279853|T279853]]
* 16:49 ottomata: restarting mysqld analytics-meta replica on db1108 to apply config change - [[phab:T272973|T272973]]
* 20:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=svwiki # [[phab:T279853|T279853]]
* 16:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6 (duration: 04m 29s)
* 20:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=srwiki # [[phab:T279853|T279853]]
* 16:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6
* 20:29 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=rowiki # [[phab:T279853|T279853]]
* 16:09 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6 (duration: 00m 35s)
* 20:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hywiki # [[phab:T279853|T279853]]
* 16:09 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6
* 20:22 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=huwiki # [[phab:T279853|T279853]]
* 14:57 moritzm: installing remaining lz4 security updates on buster
* 20:21 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hrwiki # [[phab:T279853|T279853]]
* 14:35 moritzm: installing isc-dhcp security updates
* 20:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hewiki # [[phab:T279853|T279853]]
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P16315 and previous config saved to /var/cache/conftool/dbconfig/20210607-141722-marostegui.json
* 20:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiktionary # [[phab:T279853|T279853]]
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P16314 and previous config saved to /var/cache/conftool/dbconfig/20210607-141307-marostegui.json
* 20:16 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1017-1019].wikimedia.org
* 13:35 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3) (duration: 00m 52s)
* 20:15 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=euwiki # [[phab:T279853|T279853]]
* 13:34 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3)
* 20:13 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=bnwiki # [[phab:T279853|T279853]]
* 13:34 moritzm: installing libxml2 security updates on stretch
* 20:08 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:32 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 14s)
* 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
* 13:31 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 19:58 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:28 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 54s)
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 13:27 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1016.wikimedia.org
* 12:41 moritzm: removing now obsolete Java 8 packages from gerrit* [[phab:T268225|T268225]]
* 18:34 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=idwiki # [[phab:T279853|T279853]]
* 12:36 moritzm: removing now obsolete Java 8 packages from contint* [[phab:T268225|T268225]]
* 18:33 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.wikimedia.org
* 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/: {{Gerrit|4d1969d}}: {{Gerrit|1fbb8e9}}: MentorStore: Set wasPosted to true in command line mode ([[phab:T275773|T275773]]) (duration: 00m 59s)
* 12:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 17:26 XioNoX: boot cr1-codfw:fpc1 - [[phab:T277341|T277341]]
* 12:25 moritzm: installing nginx security updates on buster
* 17:16 papaul: Adding a MPC7E to cr1-codfw
* 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --add-prefix=BROKEN --fix # [[phab:T284442|T284442]]
* 16:32 arturo: merging change to core route firewall https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681316 ([[phab:T272587|T272587]])
* 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki # [[phab:T284442|T284442]]
* 16:15 andrewbogott: updating core routers config with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681315
* 11:09 Lucas_WMDE: EU backport+config window done
* 15:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host eventlog1003.eqiad.wmnet
* 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697824{{!}}Add 2021 namespaces for wikimania wiki (T284235)]] (duration: 00m 56s)
* 15:22 urbanecm@deploy1002: Synchronized docroot/noc/conf/debug.json: {{Gerrit|dc6647b9c674429c0811116e0caca7639b766e77}}: remove mwdebug1003 from list of debug servers ([[phab:T267248|T267248]]) (duration: 00m 58s)
* 10:48 volans: reset netbox-next DB with the latest prod dump
* 15:20 urbanecm@deploy1002: Synchronized debug.json: {{Gerrit|dc6647b9c674429c0811116e0caca7639b766e77}}: remove mwdebug1003 from list of debug servers ([[phab:T267248|T267248]]) (duration: 00m 57s)
* 10:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:698472{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 15:14 hnowlan@cumin1001: START - Cookbook sre.ganeti.makevm for new host eventlog1003.eqiad.wmnet
* 10:41 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:698472{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 15:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 15:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 10:38 godog: downgrade grafana to 7.4.2 on grafana2001 - [[phab:T282863|T282863]]
* 14:59 volker-e@deploy1002: Finished deploy [design/style-guide@c4d8314]: Deploy design/style-guide: {{Gerrit|c4d8314}} “Components”: Fix “Buttons” active states (#460) (duration: 00m 07s)
* 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 14:58 volker-e@deploy1002: Started deploy [design/style-guide@c4d8314]: Deploy design/style-guide: {{Gerrit|c4d8314}} “Components”: Fix “Buttons” active states (#460)
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 14:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 14:38 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 14:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 14:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:28 kormat: reimaging db1157 [[phab:T283131|T283131]]
* 14:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:24 moritzm: remove now obsolete nginx mods and dependencies on htmldumper1001 [[phab:T164456|T164456]]
* 14:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 14:30 moritzm: installing exim updates from Buster point release
* 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 14:25 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a] (duration: 04m 56s)
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 14:25 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 depooling: reimage to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16311 and previous config saved to /var/cache/conftool/dbconfig/20210607-100822-kormat.json
* 14:24 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 14:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:20 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a]
* 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:43 moritzm: upgrading bullseye hosts to latest packages in testing
* 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:17 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a] (duration: 00m 07s)
* 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:17 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a]
* 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a] (duration: 00m 03s)
* 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 14:16 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a]
* 09:03 moritzm: installing imagemagick security updates on stretch
* 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 06:05 marostegui: Upgrade mysql on dbstore1003 [[phab:T283235|T283235]]
* 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 05:57 marostegui: Stop dbstore1004 to clone dbstore1007 [[phab:T283125|T283125]]
* 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
* 05:37 marostegui: Depool clouddb1020 (s5, s8) for upgrade
* 14:15 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
* 14:15 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
* 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
* 14:14 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
* 04:48 marostegui: Depool clouddb1019:3314 (long running alter table)
* 14:14 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a] (duration: 14m 50s)
* 14:11 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:04 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:01 jiji@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet,cluster=videoscaler
* 13:59 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a]
* 13:42 moritzm: upgrading mw1276 to PHP 7.2.34
* 13:40 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:40 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 13s)
* 13:40 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 13:38 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:36 otto@deploy1002: Finished deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint (duration: 05m 17s)
* 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:33 moritzm: upgrading mw1261 to PHP 7.2.34
* 13:31 otto@deploy1002: Started deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint
* 13:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:26 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:22 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:21 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 13:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/includes/actions/RollbackAction.php: {{Gerrit|ccbfcf28a2f507ed40dcf7af748c30f581b5079f}}: Do not mark rollbacks as bot edits ([[phab:T280655|T280655]]) (duration: 00m 57s)
* 13:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 13:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:07 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
* 13:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
* 12:58 moritzm: reimaging cumin2002 to bullseye [[phab:T276589|T276589]]
* 12:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 12:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:49 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:47 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:42 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf1 to component/php72
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 to check its tables [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15483 and previous config saved to /var/cache/conftool/dbconfig/20210420-124118-marostegui.json
* 12:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 12:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:18 CFisch_WMDE: European mid-day backport window done
* 12:05 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:681321{{!}}Add NS_PROJECT alias for azwiki (T280577)]] (duration: 00m 57s)
* 12:04 moritzm: drain ganeti5003
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:54 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: [[gerrit:681153{{!}}CommentFormatter: Add ext-discussiontools-section class instead of overwriting (T280433)]] (duration: 00m 57s)
* 11:47 moritzm: failover ganeti master in eqsin to ganeti5001
* 11:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:38 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWParameterPage.js: Backport: [[gerrit:679462{{!}}Add filtering for the suggested values combo box (T271898)]] (duration: 00m 58s)
* 11:15 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:676930{{!}}Add default import sources (T214139)]] (duration: 00m 58s)
* 11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:07 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:49 _joe_: temporary installing some python packages on deploy1002 for testing
* 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 10:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 10:20 moritzm: drain ganeti5001
* 10:11 hnowlan: opening access to cassandra on new AQS hosts (aqs101*) to analytics-in4 filter
* 10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1001.eqiad.wmnet
* 10:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host aphlict1001.eqiad.wmnet
* 09:42 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
* 09:42 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
* 09:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 09:40 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 08:50 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
* 08:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
* 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
* 08:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
* 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
* 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
* 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
* 08:09 dcaro: reprepro updating thirdparty/ceph-octopus repo
* 08:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
* 08:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
* 08:05 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 08:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1086 from dbctl [[phab:T278229|T278229]]', diff saved to https://phabricator.wikimedia.org/P15482 and previous config saved to /var/cache/conftool/dbconfig/20210420-075949-marostegui.json
* 07:38 XioNoX: BGP: prioritize directly connected peers - [[phab:T280054|T280054]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15480 and previous config saved to /var/cache/conftool/dbconfig/20210420-073808-root.json
* 07:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 07:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15479 and previous config saved to /var/cache/conftool/dbconfig/20210420-072305-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15478 and previous config saved to /var/cache/conftool/dbconfig/20210420-070801-root.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15477 and previous config saved to /var/cache/conftool/dbconfig/20210420-065257-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
* 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE


== 2021-04-19 ==
== 2021-06-05 ==
* 22:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 16:16 Amir1: deleting all private archives of mm2. All are inaccessible now ([[phab:T282303|T282303]])
* 22:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 15:21 Amir1: delete mbox files of group D and E in mm2 ([[phab:T282303|T282303]])
* 22:37 Trey314159: reindexing wikidata on cloudelastic finished/failed ([[phab:T274200|T274200]])
* 14:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 Trey314159: reindexing commons and wikidata on elastic@eqiad finished/failed ([[phab:T274200|T274200]])
* 00:21 mutante: backup1001 - systemctl baclua-dir works again (restoring backup for non-existing host)
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
* 00:18 mutante: backup1001 systemctl reload bacula-dir  fails
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
* 21:03 sbassett: Deployed security patch for [[phab:T280226|T280226]]
* 19:56 dcausse: repool wdqs1005
* 19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
* 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
* 18:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
* 18:56 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/tests: Factor out rollback logic from WikiPage - /tests (duration: 00m 59s)
* 18:55 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/maintenance: Factor out rollback logic from WikiPage - /maintenance (duration: 00m 57s)
* 18:51 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/includes/: Factor out rollback logic from WikiPage - /includes (duration: 01m 01s)
* 18:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
* 18:47 jiji@cumin1001: conftool action : set/pooled=yes; selector: cluster=thumbor,name=thumbor2001.codfw.wmnet
* 18:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
* 18:39 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T274436|T274436]] Math: Enable RESTBase-less Wikidata math validation (duration: 00m 56s)
* 18:34 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
* 18:21 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T249745|T249745]] [EventBus] Make eventage-main timeout consistent with envoy (duration: 00m 56s)
* 18:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/: {{Gerrit|66d137b75a7073c7162c443cc8c6ec6f3be714e0}}: Remove <header> tags around headings for compat with MobileFrontend ([[phab:T280433|T280433]]) (duration: 00m 59s)
* 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
* 18:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/includes/Mentorship/Store/DatabaseMentorStore.php: {{Gerrit|0233507470377f6ac45768e345cd2e359e5d0e57}}: DatabaseMentorStore: Fix deprecation warning in upsert query ([[phab:T280525|T280525]]) (duration: 00m 57s)
* 17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
* 17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1004.eqiad.wmnet
* 17:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
* 17:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
* 17:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
* 17:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1002.eqiad.wmnet
* 16:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
* 16:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1001.eqiad.wmnet
* 16:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
* 16:25 hoo: Updated the Wikidata property suggester with data from the 2021-04-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15474 and previous config saved to /var/cache/conftool/dbconfig/20210419-161134-root.json
* 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 90%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15473 and previous config saved to /var/cache/conftool/dbconfig/20210419-155631-root.json
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 80%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15472 and previous config saved to /var/cache/conftool/dbconfig/20210419-154127-root.json
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 70%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15471 and previous config saved to /var/cache/conftool/dbconfig/20210419-152623-root.json
* 15:24 volans: reverted debmonitor-client to 0.2.0-1 on apt.w.o for jessie-wikimedia
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 60%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15470 and previous config saved to /var/cache/conftool/dbconfig/20210419-151119-root.json
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15469 and previous config saved to /var/cache/conftool/dbconfig/20210419-145616-root.json
* 14:53 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename RelatedArticles wmg variables to wg (duration: 00m 56s)
* 14:53 jbond42: update debmonitor-client - [[phab:T280484|T280484]]
* 14:52 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove RelatedArticles extension function and wmg to wg mapping (duration: 00m 56s)
* 14:48 reedy@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Use namespaced PoolCounter Client (duration: 00m 57s)
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 [[phab:T278229|T278229]]', diff saved to https://phabricator.wikimedia.org/P15468 and previous config saved to /var/cache/conftool/dbconfig/20210419-144422-marostegui.json
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 40%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15467 and previous config saved to /var/cache/conftool/dbconfig/20210419-144112-root.json
* 14:41 volans: uploaded debmonitor-client 0.2.8 to apt.w.o for jessie, stretch, buster, bullseye
* 14:29 hnowlan: imported envoyproxy_1.16.3-1 debs to envoy-future component
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 30%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15466 and previous config saved to /var/cache/conftool/dbconfig/20210419-142608-root.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 20%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15465 and previous config saved to /var/cache/conftool/dbconfig/20210419-141105-root.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 15%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15464 and previous config saved to /var/cache/conftool/dbconfig/20210419-135601-root.json
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15463 and previous config saved to /var/cache/conftool/dbconfig/20210419-134057-root.json
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15462 and previous config saved to /var/cache/conftool/dbconfig/20210419-132554-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15461 and previous config saved to /var/cache/conftool/dbconfig/20210419-131936-marostegui.json
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15460 and previous config saved to /var/cache/conftool/dbconfig/20210419-131501-marostegui.json
* 12:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bd076306c0ae0428ff13743f499b2a02d42b6eab}}: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15459 and previous config saved to /var/cache/conftool/dbconfig/20210419-125600-marostegui.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15458 and previous config saved to /var/cache/conftool/dbconfig/20210419-125407-marostegui.json
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1182 to dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15457 and previous config saved to /var/cache/conftool/dbconfig/20210419-125301-marostegui.json
* 12:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef0f68e2a9c1c638911bb06c47ba6e8ef88ee393}}: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 12:38 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # [[phab:T279853|T279853]]
* 12:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
* 12:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e3cce192f1e99cbcae739f234271411d10974ac}}: cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD ([[phab:T279853|T279853]]) (duration: 00m 58s)
* 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
* 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
* 12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
* 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
* 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 11:33 moritzm: imported debdeploy 0.0.99.13-1+deb11u1 to bullseye-wikimedia [[phab:T275873|T275873]]
* 11:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki --force # [[phab:T279853|T279853]]
* 11:11 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki # [[phab:T279853|T279853]]
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|03f8ed819091624f5ae4a8d7ed3631dc322fabcd}}: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 11:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:680871{{!}}Disable legacy javascript variable for the rest of wikis (T72470)]] (duration: 00m 57s)
* 11:02 moritzm: import promethus-rsyslog-exporter for bullseye-wikimedia/main
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:681008{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:681008{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:34 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 10:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 10:24 hnowlan: imported 1.16.3 into envoy-future
* 10:22 moritzm: reimaging theemin to bullseye
* 10:15 dcausse: depooling wdqs1005
* 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 10:05 arturo: aborrero@apt1001:~ $ sudo -i reprepro --component thirdparty/kubeadm-k8s-1-18 update buster-wikimedia
* 10:04 arturo: aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished (remove old buster-wikimedia{{!}}thirdparty/kubeadm-k8s-1-15,16 repos and packages)
* 09:56 ema: cp3051: varnish-frontend-restart to apply exp policy settings changes starting from empty cache [[phab:T275809|T275809]]
* 09:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 09:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 09:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15454 and previous config saved to /var/cache/conftool/dbconfig/20210419-092251-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15453 and previous config saved to /var/cache/conftool/dbconfig/20210419-092234-marostegui.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15452 and previous config saved to /var/cache/conftool/dbconfig/20210419-091535-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 90%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15451 and previous config saved to /var/cache/conftool/dbconfig/20210419-090747-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15450 and previous config saved to /var/cache/conftool/dbconfig/20210419-090031-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 80%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15449 and previous config saved to /var/cache/conftool/dbconfig/20210419-085243-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15448 and previous config saved to /var/cache/conftool/dbconfig/20210419-084834-marostegui.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15447 and previous config saved to /var/cache/conftool/dbconfig/20210419-084528-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15446 and previous config saved to /var/cache/conftool/dbconfig/20210419-084523-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 70%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15445 and previous config saved to /var/cache/conftool/dbconfig/20210419-083740-root.json
* 08:35 ema: restart debmonitor-client.service on cp4030, dns5002, an-worker1106 [[phab:T280484|T280484]]
* 08:34 marostegui: Testing log
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15444 and previous config saved to /var/cache/conftool/dbconfig/20210419-083021-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15443 and previous config saved to /var/cache/conftool/dbconfig/20210419-083018-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15442 and previous config saved to /var/cache/conftool/dbconfig/20210419-082559-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 60%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15441 and previous config saved to /var/cache/conftool/dbconfig/20210419-082236-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15440 and previous config saved to /var/cache/conftool/dbconfig/20210419-082000-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15439 and previous config saved to /var/cache/conftool/dbconfig/20210419-081517-root.json
* 08:07 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
* 08:07 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15438 and previous config saved to /var/cache/conftool/dbconfig/20210419-080732-root.json
* 08:07 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15437 and previous config saved to /var/cache/conftool/dbconfig/20210419-080456-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15436 and previous config saved to /var/cache/conftool/dbconfig/20210419-080454-root.json
* 08:03 moritzm: installing python-bleach security updates
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15435 and previous config saved to /var/cache/conftool/dbconfig/20210419-080013-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 40%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15434 and previous config saved to /var/cache/conftool/dbconfig/20210419-075229-root.json
* 07:51 moritzm: upgrade mwdebug2002 to PHP 7.2.34
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15433 and previous config saved to /var/cache/conftool/dbconfig/20210419-074953-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15432 and previous config saved to /var/cache/conftool/dbconfig/20210419-074950-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15431 and previous config saved to /var/cache/conftool/dbconfig/20210419-074510-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15430 and previous config saved to /var/cache/conftool/dbconfig/20210419-074155-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 30%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15429 and previous config saved to /var/cache/conftool/dbconfig/20210419-073725-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15428 and previous config saved to /var/cache/conftool/dbconfig/20210419-073449-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15427 and previous config saved to /var/cache/conftool/dbconfig/20210419-073446-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15426 and previous config saved to /var/cache/conftool/dbconfig/20210419-073425-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 20%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15425 and previous config saved to /var/cache/conftool/dbconfig/20210419-072221-root.json
* 07:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:19 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15424 and previous config saved to /var/cache/conftool/dbconfig/20210419-071943-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15423 and previous config saved to /var/cache/conftool/dbconfig/20210419-071921-root.json
* 07:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15422 and previous config saved to /var/cache/conftool/dbconfig/20210419-071701-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 15%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15421 and previous config saved to /var/cache/conftool/dbconfig/20210419-070718-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15420 and previous config saved to /var/cache/conftool/dbconfig/20210419-070439-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15419 and previous config saved to /var/cache/conftool/dbconfig/20210419-070418-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15418 and previous config saved to /var/cache/conftool/dbconfig/20210419-070035-marostegui.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15417 and previous config saved to /var/cache/conftool/dbconfig/20210419-065627-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15416 and previous config saved to /var/cache/conftool/dbconfig/20210419-065213-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15415 and previous config saved to /var/cache/conftool/dbconfig/20210419-064914-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15414 and previous config saved to /var/cache/conftool/dbconfig/20210419-064600-marostegui.json
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15413 and previous config saved to /var/cache/conftool/dbconfig/20210419-064123-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15412 and previous config saved to /var/cache/conftool/dbconfig/20210419-062620-root.json
* 06:17 _joe_: upgrading envoy everywhere in eqiad [[phab:T280317|T280317]]
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15411 and previous config saved to /var/cache/conftool/dbconfig/20210419-061116-root.json
* 06:10 _joe_: upgrading envoy everywhere in codfw [[phab:T280317|T280317]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15410 and previous config saved to /var/cache/conftool/dbconfig/20210419-060321-marostegui.json
* 06:01 _joe_: rolling out further envoy upgrades [[phab:T280317|T280317]]
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 10%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15409 and previous config saved to /var/cache/conftool/dbconfig/20210419-055613-root.json
* 05:53 marostegui: Stop sanitarium master on s2 (lag will show up on clouddb* labsdb* hosts) [[phab:T272008|T272008]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15408 and previous config saved to /var/cache/conftool/dbconfig/20210419-055240-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P15407 and previous config saved to /var/cache/conftool/dbconfig/20210419-054831-marostegui.json
* 05:42 marostegui: Stop sanitarium master on s1 (lag will show up on clouddb* labsdb* hosts) [[phab:T272008|T272008]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15406 and previous config saved to /var/cache/conftool/dbconfig/20210419-054158-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15405 and previous config saved to /var/cache/conftool/dbconfig/20210419-053730-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15404 and previous config saved to /var/cache/conftool/dbconfig/20210419-053127-marostegui.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1179 to dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15403 and previous config saved to /var/cache/conftool/dbconfig/20210419-053050-marostegui.json
* 05:05 marostegui: Restart m2 database master [[phab:T280251|T280251]]


== 2021-04-18 ==
== 2021-06-04 ==
* 06:40 Amir1: cleaning watchlist of User:Mr._Ibrahem in wikidatawiki (in main ns only)
* 22:08 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4001.wikimedia.org
* 21:51 cwhite@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4001.wikimedia.org
* 20:59 bblack: repool cp1087 - [[phab:T278729|T278729]]
* 20:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
* 20:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
* 19:06 bblack: depool cp1087 - [[phab:T278729|T278729]]
* 18:21 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:36 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
* 17:33 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
* 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
* 17:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 17:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 15:25 topranks: Adding 1:1 NAT configuration for fran2001 / analytics.codfw.wikimedia.org to pfw3-codfw (backup site)
* 14:47 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I434d9cfa29d84f}} (duration: 00m 56s)
* 14:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/extension.json: {{Gerrit|Iea41ab8599ffae}} (duration: 00m 56s)
* 14:44 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/includes/: {{Gerrit|Iea41ab8599ffae}} (duration: 00m 59s)
* 14:41 krinkle@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
* 13:39 Krinkle: mwmaint1002: Running purge_parsercache_now.php on pc1008, server 3/4, ref [[phab:T282761|T282761]]
* 13:33 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:46 marostegui: Upgrade mysql on clouddb1016 [[phab:T283235|T283235]]
* 12:27 marostegui: Upgrade mysql on clouddb1015 [[phab:T283235|T283235]]
* 11:20 jbond: upload debmonitor-client_0.3.0-1+deb10u3_all.deb to apt
* 10:59 topranks: Running homer for Gerrit 698162: Set up BGP peering to doh5001 in eqsin, triggering DoH /24 announcement there.
* 09:47 ema: pool cp1087 [[phab:T278729|T278729]]
* 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16304 and previous config saved to /var/cache/conftool/dbconfig/20210604-091742-root.json
* 09:06 ema: reboot cp1087 [[phab:T278729|T278729]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16303 and previous config saved to /var/cache/conftool/dbconfig/20210604-090239-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16302 and previous config saved to /var/cache/conftool/dbconfig/20210604-084735-root.json
* 08:33 marostegui: Upgrade db1110 [[phab:T283235|T283235]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16301 and previous config saved to /var/cache/conftool/dbconfig/20210604-083232-root.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16300 and previous config saved to /var/cache/conftool/dbconfig/20210604-082956-marostegui.json
* 08:20 godog: upgrade karma to 0.86-1
* 07:38 jynus: stop and upgrade db1150 [[phab:T283235|T283235]]
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16299 and previous config saved to /var/cache/conftool/dbconfig/20210604-073326-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16298 and previous config saved to /var/cache/conftool/dbconfig/20210604-073318-root.json
* 07:29 moritzm: cleanup now unused nginx mods and former deps on install* and puppetdb* servers after switch towards nginx-light (various X11 libs and libxslt) [[phab:T164456|T164456]]
* 07:24 moritzm: cleanup now unused nginx mods and former deps on install* servers after switch towards nginx-light (various X11 libs and libxslt)
* 07:19 urbanecm: Password reset for SUL User:Dominic_Mayers  ([[phab:T282656|T282656]])
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16297 and previous config saved to /var/cache/conftool/dbconfig/20210604-071823-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16296 and previous config saved to /var/cache/conftool/dbconfig/20210604-071815-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16295 and previous config saved to /var/cache/conftool/dbconfig/20210604-070319-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16294 and previous config saved to /var/cache/conftool/dbconfig/20210604-070311-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16293 and previous config saved to /var/cache/conftool/dbconfig/20210604-064815-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16292 and previous config saved to /var/cache/conftool/dbconfig/20210604-064807-root.json
* 06:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:42 marostegui: Upgrade mysql on db1096:3315 db1096:3316
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 db1096:3315', diff saved to https://phabricator.wikimedia.org/P16291 and previous config saved to /var/cache/conftool/dbconfig/20210604-064242-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16290 and previous config saved to /var/cache/conftool/dbconfig/20210604-055521-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16289 and previous config saved to /var/cache/conftool/dbconfig/20210604-054017-root.json
* 05:26 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16288 and previous config saved to /var/cache/conftool/dbconfig/20210604-052514-root.json
* 05:24 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 05:23 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 05:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:17 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 05:16 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16287 and previous config saved to /var/cache/conftool/dbconfig/20210604-051010-root.json
* 04:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
* 04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
* 04:25 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2002.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:22 ryankemper: [[phab:T280382|T280382]] `wdqs2001.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:33 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "repair overinflated wikidata jnl" --blazegraph_instance blazegraph`
* 02:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:30 ryankemper: [[phab:T280382|T280382]] `wdqs1005.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 02:25 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo pool` (caught up on lag)
* 02:09 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 02:06 ebernhardson: post-deploy restart airflow-(webserver{{!}}scheduer) on an-airflow1001
* 02:05 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift (duration: 04m 40s)
* 02:00 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift
* 01:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 00:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:08 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T280886|T280886]] (duration: 00m 57s)
* 00:07 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 00:06 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 00:05 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:05 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)


== 2021-04-17 ==
== 2021-06-03 ==
* 16:16 Amir1: cleaning SuccuBot's watchlist in wikidatawiki
* 23:41 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T280886|T280886]] (duration: 00m 56s)
* 00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
* 23:40 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T280886|T280886]] (duration: 00m 57s)
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
* 23:33 mutante: installing OS on fresh VM doh5001
* 00:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1402.eqiad.wmnet
* 23:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
* 00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
* 23:28 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
* 23:09 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:694686{{!}}Restrict changetags to sysops and bots on meta]] [[phab:T283625|T283625]] (duration: 00m 58s)
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1402.eqiad.wmnet
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 00:14 ryankemper: [[phab:T267927|T267927]] `sudo run-puppet-agent` and `sudo pool` on `wdqs2003`
* 22:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 22:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 22:36 ryankemper: [[phab:T280382|T280382]] Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after
* 00:08 ryankemper: [[phab:T267927|T267927]] Reload of `wdqs2003` complete
* 22:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:07 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:35 ryankemper: [[phab:T280382|T280382]] `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
* 22:28 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:54 shdubsh: restart kafka on kafka-logging to take new retention config
* 20:47 sbassett: Deployed security patch for [[phab:T282932|T282932]]
* 20:37 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader[12]001
* 20:35 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s)
* 20:34 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 20:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:34 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container
* 20:34 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 20:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 19:58 mutante: [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts
* 19:56 mutante: [mwmaint1002:~] $ sudo systemctl start  daily_account_consistency_check.service
* 19:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
* 19:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s)
* 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org
* 19:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs
* 19:33 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images - [[phab:T251918|T251918]] -  icinga-wm> RECOVERY - Check systemd state on deneb is OK
* 19:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:32 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
* 19:28 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 19:27 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 19:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org
* 19:14 mutante: install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers [[phab:T164456|T164456]]
* 19:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
* 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
* 19:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
* 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
* 18:52 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)
* 18:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter
* 18:46 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 18:46 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 18:39 ryankemper: [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)
* 18:37 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547)
* 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
* 18:28 mutante: temp. disabling puppet on install* servers. switching nginx to light variant ([[phab:T164456|T164456]])
* 18:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s)
* 18:16 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter
* 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
* 17:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
* 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
* 17:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
* 17:37 brennen: gitlab1001: re-running install-gitlab-server.sh
* 17:16 urandom: remove dropped Cassandra keyspace snapshots -- [[phab:T258414|T258414]]
* 16:55 ejegg: updated payments-wiki from {{Gerrit|6fac77f60e}} to {{Gerrit|7be0534b91}}
* 16:23 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:49 topranks: Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs.
* 15:27 papaul: pdu  replacement  complete
* 15:25 moritzm: upgrading gitlab to 13.11.5
* 15:08 papaul: disconnect ps2-d8-codfw for replacement
* 14:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:54 topranks: Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002.
* 14:23 moritzm: installing nginx security updates on buster
* 14:12 moritzm: installing postgresql-9.6 security updates
* 13:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16285 and previous config saved to /var/cache/conftool/dbconfig/20210603-130059-root.json
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16284 and previous config saved to /var/cache/conftool/dbconfig/20210603-124556-root.json
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16283 and previous config saved to /var/cache/conftool/dbconfig/20210603-123243-root.json
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16282 and previous config saved to /var/cache/conftool/dbconfig/20210603-123052-root.json
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16281 and previous config saved to /var/cache/conftool/dbconfig/20210603-121739-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16280 and previous config saved to /var/cache/conftool/dbconfig/20210603-121548-root.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16279 and previous config saved to /var/cache/conftool/dbconfig/20210603-121205-marostegui.json
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16278 and previous config saved to /var/cache/conftool/dbconfig/20210603-121133-root.json
* 12:06 moritzm: restarting FPM on mw canaries to pick up lz4 update
* 12:03 moritzm: installing lz4 security updates on buster
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16277 and previous config saved to /var/cache/conftool/dbconfig/20210603-120235-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16276 and previous config saved to /var/cache/conftool/dbconfig/20210603-115628-root.json
* 11:53 moritzm: installing curl security updates on stretch
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16275 and previous config saved to /var/cache/conftool/dbconfig/20210603-114731-root.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16274 and previous config saved to /var/cache/conftool/dbconfig/20210603-114503-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16273 and previous config saved to /var/cache/conftool/dbconfig/20210603-114325-marostegui.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16272 and previous config saved to /var/cache/conftool/dbconfig/20210603-114124-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16271 and previous config saved to /var/cache/conftool/dbconfig/20210603-113000-root.json
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16270 and previous config saved to /var/cache/conftool/dbconfig/20210603-112620-root.json
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16269 and previous config saved to /var/cache/conftool/dbconfig/20210603-112243-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16268 and previous config saved to /var/cache/conftool/dbconfig/20210603-111456-root.json
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e84096857c8a2f753e077aa6c3e37b910b9e1fcd}}: jawiki: extended confirmed should be 120 days since first edit, not registration ([[phab:T284212|T284212]]) (duration: 00m 58s)
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16267 and previous config saved to /var/cache/conftool/dbconfig/20210603-110906-root.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16266 and previous config saved to /var/cache/conftool/dbconfig/20210603-105953-root.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16265 and previous config saved to /var/cache/conftool/dbconfig/20210603-105536-marostegui.json
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16264 and previous config saved to /var/cache/conftool/dbconfig/20210603-105402-root.json
* 10:52 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:41 godog: test librenms/AM paging
* 10:40 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16263 and previous config saved to /var/cache/conftool/dbconfig/20210603-103858-root.json
* 10:28 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16262 and previous config saved to /var/cache/conftool/dbconfig/20210603-102354-root.json
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16261 and previous config saved to /var/cache/conftool/dbconfig/20210603-101950-marostegui.json
* 10:13 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc2 primary [[phab:T282761|T282761]] (duration: 00m 58s)
* 09:38 marostegui: Deploy schema change on s3 codfw master (with replication) - [[phab:T282373|T282373]] [[phab:T282372|T282372]] [[phab:T282371|T282371]]
* 09:37 moritzm: upgrading eqiad to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) [[phab:T235162|T235162]]
* 08:55 moritzm: uploading gitlab-ce 13.11.5-ce to apt.wikimedia.org thirdparty/gitlab
* 08:43 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:37 moritzm: upgrading codfw to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) [[phab:T235162|T235162]]
* 08:23 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:09 moritzm: upgrading esams/eqsin to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range)
* 07:52 ryankemper: [WDQS] Pooled `wdqs1008` and `wdqs2006` (all caught up on lag)
* 07:48 moritzm: uploaded debmonitor-client 0.3.0-1+deb10u2 to apt.wikimedia.org
* 06:24 ryankemper: [WDQS] De-pooled `wdqs1008` and `wdqs2006` (~1 hour of lag to catch up on)
* 06:23 ryankemper: [[phab:T280382|T280382]] `wdqs2006.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 06:23 ryankemper: [[phab:T280382|T280382]] `wdqs1008.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:20 marostegui: Deploy schema change on db1121, lag will appear on s4 (commonswiki) wiki replicas - [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P16259 and previous config saved to /var/cache/conftool/dbconfig/20210603-051853-marostegui.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16258 and previous config saved to /var/cache/conftool/dbconfig/20210603-051402-root.json
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16257 and previous config saved to /var/cache/conftool/dbconfig/20210603-045859-root.json
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16256 and previous config saved to /var/cache/conftool/dbconfig/20210603-044355-root.json
* 04:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:36 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 04:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:30 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 04:29 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 04:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16255 and previous config saved to /var/cache/conftool/dbconfig/20210603-042851-root.json
* 02:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
* 02:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
* 02:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
* 02:07 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1008.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 02:07 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
* 02:05 ryankemper: [[phab:T280382|T280382]] `wdqs1003.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 02:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:51 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2006.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 01:47 ryankemper: [[phab:T280382|T280382]] `wdqs2003.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 01:43 ryankemper: [WDQS] Pooled `wdqs1004` (caught up on lag)
* 01:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Gadgets: Backport: [[gerrit:697816{{!}}Reduce message parse in GadgetHooks::getPreferences (second time) (T58633 T278650)]], Try II (duration: 00m 57s)
* 00:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/user/UserOptionsManager.php: Backport: [[gerrit:697818{{!}}user: Accept options-messages for multiselect user options (T58633 T278650)]] (duration: 00m 57s)
* 00:35 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)


== 2021-04-16 ==
== 2021-06-02 ==
* 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwdebug1003.eqiad.wmnet
* 23:57 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
* 23:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
* 23:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
* 23:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mwdebug1003.eqiad.wmnet
* 23:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwdebug1003.eqiad.wmnet
* 23:47 ryankemper: [[phab:T280382|T280382]] `wdqs1004.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 23:47 mutante: decom'ing mwdebug1003, stretch VM created in [[phab:T267248|T267248]]
* 23:41 ladsgroup@deploy1002: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 23:39 mutante: reimaging last 3 remaining stretch appservers with buster, mw1307, mw1402, mw1403
* 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
* 23:28 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
* 23:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 21:08 ejegg: updated fundraising python tools from {{Gerrit|ef54260b0d}} to {{Gerrit|3d950fffbd}}
* 23:26 ryankemper: [[phab:T280382|T280382]] `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid10`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 20:40 Trey314159: reindexing wikidata on cloudelastic... AGAIN ([[phab:T274200|T274200]])
* 23:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:48 ryankemper: [[phab:T267927|T267927]] Transferring from `wdqs2008`->`wdqs2003` to resolve the data corruption on `wdqs2003`
* 23:18 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes: Backport: [[gerrit:697817{{!}}Allow html form field option 'options-messages' to get parsed (T58633)]] (duration: 01m 01s)
* 17:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 22:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 17:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
* 22:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 17:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
* 22:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697855{{!}}Enable wgVectorConsolidateUserLinks on the beta cluster (T266536)]] (duration: 00m 57s)
* 17:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
* 22:39 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage_2`
* 17:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
* 22:34 ryankemper: [[phab:T280382|T280382]] Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'P<nowiki>{</nowiki>apt*<nowiki>}</nowiki>' 'sudo rm -rfv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
* 17:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
* 22:30 ryankemper: [[phab:T280382|T280382]] Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'P<nowiki>{</nowiki>install*<nowiki>}</nowiki>' 'sudo rm -fv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
* 17:35 mutante: depooling mwdebug1003 (stretch VM, will be removed), mwdebug1001/1002 (buster) and unchanged
* 22:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
* 17:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1003.eqiad.wmnet
* 22:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
* 17:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
* 22:19 Amir1: setting charset of all tables in wikitech to binary ([[phab:T284108|T284108]] [[phab:T269348|T269348]])
* 17:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
* 22:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage_2`
* 17:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
* 22:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:03 ryankemper: [[phab:T267927|T267927]] Pooled `wdqs1007`, `wdqs2003`, `wdqs1008`, `wdqs2004`
* 22:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 17:00 ryankemper: [[phab:T267927|T267927]] Following data transfers complete: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
* 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2007.codfw.wmnet
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:59 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:59 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 21:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 16:09 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:57 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 21:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
* 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3002.wikimedia.org
* 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 21:37 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
* 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 21:30 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 15:22 urbanecm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 21:28 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
* 14:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
* 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
* 14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
* 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs2007.codfw.wmnet
* 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
* 21:17 ryankemper: `ryankemper@wdqs1013:~$ sudo depool`  (catching up on 17.9h lag)
* 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
* 21:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
* 14:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
* 21:10 ryankemper: [[phab:T280382|T280382]] [[phab:T281437|T281437]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2007.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 14:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
* 21:10 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 14:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
* 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3001.wikimedia.org
* 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts doh3001.wikimedia.org
* 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
* 20:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3002.wikimedia.org
* 14:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
* 20:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
* 14:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2020.codfw.wmnet
* 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
* 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
* 19:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
* 13:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2019.codfw.wmnet
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e9c981d5173b1d611458f6c70b34d73476b7bbde}}: Revert "enwiktionary: Raise AF emergency disable treshold+count" ([[phab:T283460|T283460]]) (duration: 00m 58s)
* 12:59 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
* 18:11 urbanecm: Deployed security patch for [[phab:T281972|T281972]]
* 12:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4bf76fc09bc06f76ce842d42b77fe6b036943b69}}: Make DiscussionTools replytool available for everyone on wikitech ([[phab:T283119|T283119]]) (duration: 00m 58s)
* 12:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
* 17:33 legoktm: disabled Kadirselcuk gerrit account, +1 spam (and blocked elsewhere)
* 12:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:55 legoktm: restarted apache2 on lists1001 for https://gerrit.wikimedia.org/r/697805
* 12:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
* 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:41 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
* 16:19 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:37 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cescout1001.eqiad.wmnet
* 12:25 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:22 jayme: updated envoyproxy to 1.15.4-1 on 'A:mw-canary or A:restbase-canary'
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts cescout1001.eqiad.wmnet
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
* 13:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 11:02 moritzm: imported ferm 2.5.1-1+wmf1 to bullseye-wikimedia/main [[phab:T275873|T275873]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
* 12:05 jbond: enable puppet fleet wide. post changing puppetdb to use nginx-light #[[phab:T164456|T164456]]
* 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
* 11:54 jbond: disable puppet fleet wide. changing puppetdb to use nginx-light #[[phab:T164456|T164456]]
* 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
* 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/includes/actions/InfoAction.php: {{Gerrit|85feaa15d9bbda130541adb6302f31c4372e6519}}: InfoAction: Cast wgNamespaceProtection to array ([[phab:T283751|T283751]]) (duration: 01m 00s)
* 10:44 arturo: merging homer change to cr-eqiad ([[phab:T279342|T279342]])
* 11:08 jbond: update mod_auth_cas [[phab:T264605|T264605]]
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f12e368481b6836eefa070ad5dcf52af3f39d479}}: Investigate MediaSearch usability on other wikis ([[phab:T278984|T278984]]) (duration: 00m 57s)
* 10:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
* 11:04 jbond: upload libapache2-mod-auth-cas_1.2-1 for buster and stretch - #[[phab:T264605|T264605]]
* 10:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
* 11:01 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf11u1_amd64.deb - #[[phab:T264605|T264605]]
* 10:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
* 10:44 topranks: Commit pfw policy {{Gerrit|1622570851}} to pfw3-codfw and pfw3-eqiad to support new host fran2001 ([[phab:T282056|T282056]])
* 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
* 10:17 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 10:08 jayme: updated envoyproxy to 1.15.4-1 on mw1325.eqiad.wmnet,restbase1026.eqiad.wmnet
* 10:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbstore1006.eqiad.wmnet
* 10:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:51 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1006.eqiad.wmnet
* 10:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2011.codfw.wmnet
* 09:14 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280396]] ([[:phab:T284118{{!}}request]])' 'OTRS' 'VRT' 'Quiddity (WMF)' # [[phab:T284118|T284118]]
* 10:03 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:12 moritzm: removed eight inactive addresses from ops@ list
* 10:00 jayme: updated envoyproxy to 1.15.4-1 on mwdebug1001.eqiad.wmnet
* 07:44 moritzm: installing squid security updates
* 09:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2011.codfw.wmnet
* 06:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
* 09:55 jayme: imported envoyproxy_1.15.4-1 to stretch-wikimedia - [[phab:T280317|T280317]]
* 06:51 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
* 09:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2010.codfw.wmnet
* 06:38 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15384 and previous config saved to /var/cache/conftool/dbconfig/20210416-093446-root.json
* 06:34 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 09:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2010.codfw.wmnet
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16249 and previous config saved to /var/cache/conftool/dbconfig/20210602-050234-root.json [REPLAY FROM 2021-06-02 05:02:34]
* 09:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2009.codfw.wmnet
* 05:36 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2009.codfw.wmnet
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071', diff saved to https://phabricator.wikimedia.org/P16248 and previous config saved to /var/cache/conftool/dbconfig/20210602-045736-marostegui.json [REPLAY FROM 2021-06-02 04:57:36]
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15383 and previous config saved to /var/cache/conftool/dbconfig/20210416-091942-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2071', diff saved to https://phabricator.wikimedia.org/P16247 and previous config saved to /var/cache/conftool/dbconfig/20210602-045717-marostegui.json [REPLAY FROM 2021-06-02 04:57:17]
* 09:13 jayme: imported envoyproxy_1.15.4-1 to buster-wikimedia - [[phab:T280317|T280317]]
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16246 and previous config saved to /var/cache/conftool/dbconfig/20210602-044730-root.json [REPLAY FROM 2021-06-02 04:47:31]
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16245 and previous config saved to /var/cache/conftool/dbconfig/20210602-043227-root.json [REPLAY FROM 2021-06-02 04:32:27]
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15380 and previous config saved to /var/cache/conftool/dbconfig/20210416-090438-root.json
* 05:32 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15374 and previous config saved to /var/cache/conftool/dbconfig/20210416-084935-root.json
* 05:31 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697671{{!}}Fix pageterms API call for Special:Nearby in Wikidata (T281639)]] (duration: 00m 56s) [REPLAY FROM 2021-06-01 21:44:06]
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15373 and previous config saved to /var/cache/conftool/dbconfig/20210416-083431-root.json
* 05:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [REPLAY FROM 2021-06-01 19:42:38]
* 07:53 elukey: run reprepro --delete clearvanished on apt1001 to clear all cloudera packages
* 05:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox [REPLAY FROM 2021-06-01 19:29:26]
* 07:41 ema: cp-upload_ulsfo: rolling varnish-frontend-restart to apply exp policy settings changes starting from empty caches [[phab:T275809|T275809]]
* 05:28 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P15372 and previous config saved to /var/cache/conftool/dbconfig/20210416-071936-marostegui.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16251 and previous config saved to /var/cache/conftool/dbconfig/20210602-051919-marostegui.json
* 06:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
* 05:18 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
* 06:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16250 and previous config saved to /var/cache/conftool/dbconfig/20210602-051738-root.json
* 06:48 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
* off: restart tcpircbot-logmsgbot on alert1001 - [[phab:T284123|T284123]]
* 06:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
* 04:56 marostegui: Test
* 06:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
* 06:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
* 05:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics-tool1001.eqiad.wmnet
* 05:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
* 05:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics-tool1001.eqiad.wmnet
* 03:31 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
* 03:26 ryankemper: [[phab:T267927|T267927]] Pooled `wdqs2001`
* 03:22 ryankemper: [[phab:T267927|T267927]] Pooled `wdqs1006` and `wdqs2002`
* 03:09 ryankemper: [[phab:T267927|T267927]] kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:05 ryankemper: [[phab:T267927|T267927]] Last round of `data-transfer`s finished successfully, proceeding to next round
* 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:30 Krinkle: Delete old data at doc1001:/srv/doc/cover/PasswordBlacklist (ref [[phab:T254799|T254799]])
* 00:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op) (duration: 00m 08s)
* 00:09 jforrester@deploy1002: Started deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op)


== 2021-04-15 ==
== 2021-06-01 ==
* 23:37 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/skin.json: Backport: [[gerrit:679842{{!}}Adjust floating override (T280260)]] (duration: 00m 56s)
* 21:09 andrewbogott: dropping a bunch of tables from the labswiki db as per [[phab:T284108|T284108]]
* 23:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/resources/skins.vector.styles.legacy/layouts/screen.less: Backport: [[gerrit:679842{{!}}Adjust floating override (T280260)]] (duration: 00m 56s)
* 17:23 Amir1: starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists ([[phab:T282303|T282303]])
* 23:31 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: [[gerrit:679845{{!}}searchSatisfaction: Default userEditBucket back to 0 edits (T280294)]] (duration: 00m 57s)
* 16:31 moritzm: updating debmonitor clients to 0.3.0 (along with cleanup of sysuser UID allocation)
* 23:17 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679947{{!}}Create Draft namespace on itwiki (T280289)]] (duration: 00m 56s)
* 15:38 legoktm: stopped mailman2 service on lists1001 ([[phab:T52864|T52864]])
* 23:09 jforrester@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 56s)
* 15:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 23:08 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 56s)
* 15:16 ryankemper: [[phab:T283223|T283223]] `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id [[phab:T283223|T283223]]` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
* 23:07 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 57s)
* 15:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 23:06 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 57s)
* 14:59 topranks: Restoring Lumen CCT {{Gerrit|442550293}} to normal metric / bring back into service ([[phab:T274234|T274234]])
* 22:56 ryankemper: [[phab:T267927|T267927]] WDQS kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1006`, `wdqs2001`->`wdqs2002`, `wdqs2008`->`wdqs1003`
* 13:56 marostegui: Stop mysql on db2079 (codfw master) - [[phab:T283743|T283743]]
* 22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 13:53 topranks: Draining Lumen CCT {{Gerrit|442550293}} to do some comparative bandwidth tests from eqiad to codfw ([[phab:T274234|T274234]])
* 22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f757748a14ac8c205f6a5fac0611216c01ceb1c}}: cawiki: Fix help panel links ([[phab:T280673|T280673]]) (duration: 00m 58s)
* 22:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 13:48 otto@deploy1002: Finished deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - [[phab:T272973|T272973]] (duration: 02m 58s)
* 22:48 ryankemper: [[phab:T267927|T267927]] pooled `wdqs1005` (all caught up on lag)
* 13:45 otto@deploy1002: Started deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - [[phab:T272973|T272973]]
* 22:46 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Manually re-enabled and ran puppet on `wdqs1005` (had closed the tmux pane which terminated the cookbook without letting it do its final cleanup)
* 13:43 topranks: Restoring Telia CT IC-307235 to normal metric / bring back into service ([[phab:T274234|T274234]])
* 22:33 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Data transfers completed successfully; small issue with new `wait_for_updater` logic is preventing termination so I ctrl+c'd manually
* 13:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
* 22:32 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 13:06 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
* 20:03 herron: migrating kafka-logging broker logstash1012 to kafka-logging1003 [[phab:T279342|T279342]]
* 12:12 dcausse: re-pooling wdsq1005 (caught-up lag)
* 19:56 Trey314159: reindexing wikidata on cloudelastic finished/failed ([[phab:T274200|T274200]])
* 12:06 moritzm: installing djvulibre security updates
* 19:43 Trey314159: reindexing wikidata on cloudelastic ([[phab:T274200|T274200]])
* 11:16 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 19:42 Trey314159: reindexing commons and wikidata on elastic@eqiad ([[phab:T274200|T274200]])
* 11:14 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 19:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e4989d2b19e07d2a816cd7f6afae077f86aca54e}}: Enable "Diff" RSS feed on meta ([[phab:T283380|T283380]]) (duration: 00m 58s)
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 11:04 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.1  refs [[phab:T278345|T278345]]
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
* 18:49 andrew@deploy1002: Finished deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev (duration: 01m 58s)
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
* 18:47 andrew@deploy1002: Started deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev
* 10:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:39 jdrewniak@deploy1002: Synchronized private/readme.php: Config: [[gerrit:679614{{!}}Add $wgWMEVectorPrefDiffSalt to private/readme (T261842)]] (duration: 01m 08s)
* 09:37 topranks: Draining Telia CT IC-307235 to do some comparative bandwidth tests from eqiad to codfw ([[phab:T274234|T274234]])
* 18:32 jdrewniak@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679613{{!}}Add mediawiki.pref_diff stream to wgEventLoggingStreamNames/wgEventStreams (T261842)]] (duration: 01m 18s)
* 08:04 hashar: Restarted Gerrit on gerrit1001 for Java 11 upgrade # [[phab:T268225|T268225]]
* 17:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:02 hashar: Restarted Gerrit on gerrit2001 for Java 11 upgrade # [[phab:T268225|T268225]]
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 07:26 dcausse: depooling wdsq1005 (lag)
* 16:42 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:14 moritzm: installing nginx security updates
* 16:34 crusnov@cumin1001: START - Cookbook sre.dns.netbox
* 05:56 legoktm: restarting mailman3 on lists1001
* 16:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
* 05:37 legoktm: uploaded django-allauth_0.44.0+ds-1~bpo10+1 mailman3_3.3.3-1~bpo10+4 to apt.wm.o
* 16:21 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Current wdqs transfers in progress: `wqds1004`->`wdqs1005`, `wdqs2008`->`wdqs2001`
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16242 and previous config saved to /var/cache/conftool/dbconfig/20210601-053137-marostegui.json
* 16:21 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16241 and previous config saved to /var/cache/conftool/dbconfig/20210601-052349-root.json
* 16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16240 and previous config saved to /var/cache/conftool/dbconfig/20210601-050845-root.json
* 16:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16239 and previous config saved to /var/cache/conftool/dbconfig/20210601-045341-root.json
* 16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16238 and previous config saved to /var/cache/conftool/dbconfig/20210601-043837-root.json
* 16:17 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Merged https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/679702 and ran puppet-agent on `cumin2001` before next round of wdqs `data-transfer`s
* 00:46 legoktm@deploy1002: Synchronized logos/config.yaml: Revert "Use eswiki 20th anniversary logos" ([[phab:T280908|T280908]]) (duration: 01m 07s)
* 16:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
* 00:43 legoktm@deploy1002: Synchronized wmf-config/logos.php: Revert "Use eswiki 20th anniversary logos" ([[phab:T280908|T280908]]) (duration: 01m 00s)
* 16:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
* 16:02 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
* 15:26 otto@deploy1002: Finished deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided) (duration: 04m 44s)
* 15:21 otto@deploy1002: Started deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided)
* 15:09 elukey@deploy1002: Finished deploy [analytics/refinery@497f6a5]: Regular analytics weekly train (duration: 13m 12s)
* 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
* 15:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
* 14:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
* 14:56 elukey@deploy1002: Started deploy [analytics/refinery@497f6a5]: Regular analytics weekly train
* 14:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
* 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
* 14:47 jayme: imported etcd-mirror_0.0.5-1 to buster-wikimedia
* 14:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
* 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
* 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
* 14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
* 14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
* 14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
* 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
* 14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
* 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
* 14:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
* 14:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
* 14:19 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]], try again (duration: 07m 45s)
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
* 14:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
* 14:12 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]], try again
* 14:11 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]] (duration: 11m 15s)
* 14:09 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
* 14:00 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]]
* 13:56 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp104[5-7].eqiad.wmnet
* 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
* 13:54 andrewbogott: upgrading packages and mediawiki on wikitech-static
* 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
* 13:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
* 13:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
* 13:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
* 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
* 13:32 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
* 13:25 ariel@cumin1001: EN