You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(eileen: civicrm revision changed from 4220fc8177 to a4caad22b1, config revision is f08249ecf9)
imported>Stashbot
(pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet'])
 
(527 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-01-18 ==
== 2022-08-18 ==
* 21:33 eileen: civicrm revision changed from {{Gerrit|4220fc8177}} to {{Gerrit|a4caad22b1}}, config revision is {{Gerrit|f08249ecf9}}
* 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2311.codfw.wmnet
* 00:48 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2310.codfw.wmnet
* 00:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2309.codfw.wmnet
* 00:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2307.codfw.wmnet
* 00:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2309.codfw.wmnet
* 00:39 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2307.codfw.wmnet
* 00:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2310.codfw.wmnet
* 00:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2311.codfw.wmnet
* 00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2023.mgmt.codfw.wmnet with reboot policy FORCED
* 20:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
* 00:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2023.mgmt.codfw.wmnet with reboot policy FORCED
* 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
* 00:06 eileen___: civicrm upgraded from {{Gerrit|97638e58}} to {{Gerrit|edfe2f16}}
* 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
* 00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
* 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
* 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2305.codfw.wmnet
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2303.codfw.wmnet
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2277.codfw.wmnet
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2276.codfw.wmnet
* 20:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2303.codfw.wmnet
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2305.codfw.wmnet
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2277.codfw.wmnet
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2276.codfw.wmnet
* 19:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
* 19:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
* 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
* 19:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
* 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
* 19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
* 19:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
* 19:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
* 18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2275.codfw.wmnet
* 18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2274.codfw.wmnet
* 18:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2273.codfw.wmnet
* 18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2271.codfw.wmnet
* 18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1136,1138].eqiad.wmnet
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2274.codfw.wmnet
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2275.codfw.wmnet
* 18:34 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1136,1138].eqiad.wmnet
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2273.codfw.wmnet
* 18:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2271.codfw.wmnet
* 18:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
* 18:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
* 18:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1132.eqiad.wmnet
* 18:20 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1132.eqiad.wmnet
* 18:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1130.eqiad.wmnet
* 18:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
* 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1130.eqiad.wmnet
* 18:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
* 18:14 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1128.eqiad.wmnet
* 18:12 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1128.eqiad.wmnet
* 17:51 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1124-1127].eqiad.wmnet
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
* 17:49 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1124-1127].eqiad.wmnet
* 17:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
* 17:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
* 17:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1121-1123].eqiad.wmnet
* 17:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
* 17:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
* 17:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1121-1123].eqiad.wmnet
* 17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
* 17:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1120.eqiad.wmnet
* 17:42 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1120.eqiad.wmnet
* 17:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1118.eqiad.wmnet
* 17:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1118.eqiad.wmnet
* 17:32 mutante: reimaging mw2271,mw2273,mw2274,mw227 (codfw only)
* 16:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
* 16:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
* 16:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
* 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
* 15:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
* 15:48 moritzm: installing wavpack security updates
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
* 15:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
* 15:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
* 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 14:43 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:31 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:30 arturo: updating packages in buster-wikimedia/thirdparty/ceph-nautilus-buster ([[phab:T272296|T272296]])
* 14:26 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:18 kormat@cumin1001: START - Cookbook sre.hosts.decommission
* 13:34 moritzm: uploaded wmf-sre-laptop 0.3.2 to apt.wikimedia.org
* 13:26 volans: installed spicerack 0.0.48-1+deb10u1 on cumin hosts
* 13:12 marostegui: Upgrade db2071 to 10.4.17 - [[phab:T268457|T268457]]
* 13:08 XioNoX: add NAT rule on pfw3-eqiad - [[phab:T272066|T272066]]
* 12:56 XioNoX: add NAT rule on pfw3-codfw - [[phab:T272066|T272066]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2008.codfw.wmnet
* 12:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
* 12:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2008.codfw.wmnet
* 12:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
* 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
* 12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
* 12:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
* 12:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2007.codfw.wmnet
* 12:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
* 12:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2007.codfw.wmnet
* 12:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2006.codfw.wmnet
* 12:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
* 12:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2006.codfw.wmnet
* 12:08 volans: uploaded spicerack_0.0.48 to apt.wikimedia.org buster-wikimedia
* 12:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2005.codfw.wmnet
* 12:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
* 12:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2005.codfw.wmnet
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
* 11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1008.eqiad.wmnet
* 11:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
* 11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1008.eqiad.wmnet
* 11:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1007.eqiad.wmnet
* 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1007.eqiad.wmnet
* 11:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
* 11:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
* 11:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1006.eqiad.wmnet
* 11:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1006.eqiad.wmnet
* 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
* 11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1005.eqiad.wmnet
* 11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1005.eqiad.wmnet
* 11:10 hashar: Restarting Gerrit main instance on gerrit1001.wikimedia.org
* 11:08 hashar: Restarting Gerrit replica on gerrit2001.wikimedia.org
* 10:58 moritzm: installing python2.7 security updates on Stretch
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13799 and previous config saved to /var/cache/conftool/dbconfig/20210118-102959-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13798 and previous config saved to /var/cache/conftool/dbconfig/20210118-101456-root.json
* 10:00 _joe_: restarting pybal on lvs1016, not talking to its etcd server
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13797 and previous config saved to /var/cache/conftool/dbconfig/20210118-095952-root.json
* 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13796 and previous config saved to /var/cache/conftool/dbconfig/20210118-094449-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13795 and previous config saved to /var/cache/conftool/dbconfig/20210118-092546-marostegui.json
* 09:24 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13794 and previous config saved to /var/cache/conftool/dbconfig/20210118-092429-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105:3311 from vslow', diff saved to https://phabricator.wikimedia.org/P13793 and previous config saved to /var/cache/conftool/dbconfig/20210118-092003-marostegui.json
* 09:13 moritzm: installing openssl 1.1 security updates on stretch
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13791 and previous config saved to /var/cache/conftool/dbconfig/20210118-090926-root.json
* 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:01 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13790 and previous config saved to /var/cache/conftool/dbconfig/20210118-085422-root.json
* 08:46 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:42 kormat@cumin1001: START - Cookbook sre.hosts.decommission
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13788 and previous config saved to /var/cache/conftool/dbconfig/20210118-083919-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 to stop replication, place db1105:3311 temporarily in vslow [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13787 and previous config saved to /var/cache/conftool/dbconfig/20210118-081740-marostegui.json
* 08:15 moritzm: installing remaining openssl 1.0 security updated on stretch
* 08:13 elukey: clean up old archiva debs and upload 2.2.4-3 to buster-wikimedia - [[phab:T272082|T272082]]
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13786 and previous config saved to /var/cache/conftool/dbconfig/20210118-080122-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13785 and previous config saved to /var/cache/conftool/dbconfig/20210118-074618-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13784 and previous config saved to /var/cache/conftool/dbconfig/20210118-073115-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13783 and previous config saved to /var/cache/conftool/dbconfig/20210118-071611-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P13782 and previous config saved to /var/cache/conftool/dbconfig/20210118-065312-marostegui.json
* 06:35 marostegui: Reboot dbproxy2001, dbproxy2002, dbproxy2003 for kernel upgrade
* 06:22 marostegui: Reboot db1154 and db1155 for kernel upgrade


== 2021-01-16 ==
== 2022-08-17 ==
* 12:18 elukey: elukey@cumin1001:~$ sudo cumin 'A:mw-app-canary and A:mw-eqiad' 'run-puppet-agent' -b 10 - [[phab:T272215|T272215]]
* 23:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:10 elukey: 'elukey@cumin1001:~$ sudo cumin 'A:mw-eqiad' 'run-puppet-agent' -b 10' [[phab:T272215|T272215]])
* 23:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 11:23 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 23:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes2023
* 23:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes2023
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:23 mutante: phab2002 - chmod -R phd /srv/repos  {{!}} find /srv/repos/ -gid 498 -exec chown phd:phd <nowiki>{</nowiki><nowiki>}</nowiki> \; [[phab:T313360|T313360]]
* 23:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
* 23:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-stretch2001.codfw.wmnet']
* 23:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2001.codfw.wmnet']
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:34 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]] (duration: 03m 17s)
* 22:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:31 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]]
* 22:16 eileen___: civicrm upgraded from {{Gerrit|4be0724d}} to {{Gerrit|97638e58}}
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:16 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs: Backport: [[gerrit:824171{{!}}Remove indexExists check for page_name_title index]] (duration: 03m 12s)
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:13 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs/frontend/FlaggedRevsUIHooks.php: Backport: [[gerrit:824169{{!}}Do not attempt to create a FlaggableWikiPage when the title can't exist (T315479)]] (duration: 03m 26s)
* 21:08 ejegg: updated civicrm from {{Gerrit|c228e3d7}} to {{Gerrit|4be0724d}}
* 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2005.codfw.wmnet with OS bullseye
* 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
* 20:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822656{{!}}InitialiseSettings: Add wmgUsePhonos (default => false) (T314294)]] (duration: 03m 29s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 samtar@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:821249{{!}}extension-list: Add Phonos (T314294)]] (duration: 03m 17s)
* 20:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2005.codfw.wmnet with OS bullseye
* 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS bullseye
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1ddc661e6e73b60542e31d2128c2add3e2307b74}}: QuickSurveys: Disable extension on JA wiki ([[phab:T311015|T311015]]) (duration: 03m 19s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2cf80d1e038b33f7f99d56ca8e30ce37cb726ef2}}: QuickSurveys: Remove research incentive survey from BN wiki ([[phab:T314333|T314333]]) (duration: 03m 24s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
* 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
* 19:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:11 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23  refs [[phab:T314186|T314186]] (duration: 03m 15s)
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:07 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23  refs [[phab:T314186|T314186]]
* 19:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1073.eqiad.wmnet with OS bullseye
* 19:01 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]] (duration: 03m 24s)
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:58 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]]
* 18:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging2004.codfw.wmnet with OS bullseye
* 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 18:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1073.eqiad.wmnet with reason: host reimage
* 18:40 urandom: disabling reserved space on codfw nodes (RESTBase), /dev/md2 (aka /srv/cassandra/instance-data) -- [[phab:T314941|T314941]]
* 18:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
* 18:38 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1073.eqiad.wmnet with reason: host reimage
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
* 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32469 and previous config saved to /var/cache/conftool/dbconfig/20220817-183223-ladsgroup.json
* 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32468 and previous config saved to /var/cache/conftool/dbconfig/20220817-183202-ladsgroup.json
* 18:25 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1073.eqiad.wmnet with OS bullseye
* 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32467 and previous config saved to /var/cache/conftool/dbconfig/20220817-181656-ladsgroup.json
* 18:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1056.eqiad.wmnet with OS bullseye
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32466 and previous config saved to /var/cache/conftool/dbconfig/20220817-180150-ladsgroup.json
* 18:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
* 17:48 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging2004.codfw.wmnet with OS bullseye
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32465 and previous config saved to /var/cache/conftool/dbconfig/20220817-174644-ladsgroup.json
* 17:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1056.eqiad.wmnet with reason: host reimage
* 17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2005
* 17:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2005
* 17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1056.eqiad.wmnet with reason: host reimage
* 17:33 ladsgroup@deploy1002: Synchronized portals: Migrate wikinews.org to the modern portals (duration: 03m 32s)
* 17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004
* 17:30 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004
* 17:29 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikinews.org to the modern portals (duration: 03m 29s)
* 17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1056.eqiad.wmnet with OS bullseye
* 17:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
* 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2004.codfw.wmnet with OS bullseye
* 16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
* 16:54 sbassett@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable StopForumSpam on candidate wikis (CS.php) - [[phab:T273220|T273220]] (duration: 03m 26s)
* 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host graphite2004.codfw.wmnet with OS bullseye
* 16:50 sbassett@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable StopForumSpam on candidate wikis (IS.php) - [[phab:T273220|T273220]] (duration: 03m 20s)
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32463 and previous config saved to /var/cache/conftool/dbconfig/20220817-162655-root.json
* 16:24 cwhite: restart logmsgbot [[phab:T257861|T257861]]
* 16:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 16:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1079.eqiad.wmnet with OS bullseye
* 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32462 and previous config saved to /var/cache/conftool/dbconfig/20220817-161151-root.json
* 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32461 and previous config saved to /var/cache/conftool/dbconfig/20220817-155653-root.json
* 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32460 and previous config saved to /var/cache/conftool/dbconfig/20220817-155646-root.json
* 15:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:54 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:823716{{!}}jawiki: Restrict abusefilter log access (2) (T315199)]] (duration: 03m 47s)
* 15:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1079.eqiad.wmnet with reason: host reimage
* 15:52 jbond: push out update for linux-image-amd64 on bullseye
* 15:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1079.eqiad.wmnet with reason: host reimage
* 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:50 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:823715{{!}}jawiki: Restrict abusefilter log access (1) (T315199)]] (duration: 03m 25s)
* 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:43 TheresNoTime: finished deploying [[gerrit:824224{{!}}RESTBase is not enabled on closed wikis (T315383)]]
* 15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=1)
* 15:42 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 15:42 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:824224{{!}}RESTBase is not enabled on closed wikis (T315383)]] (duration: 03m 27s)
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32458 and previous config saved to /var/cache/conftool/dbconfig/20220817-154148-root.json
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P32457 and previous config saved to /var/cache/conftool/dbconfig/20220817-154142-root.json
* 15:41 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 15:41 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 15:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1079.eqiad.wmnet with OS bullseye
* 15:37 jbond: install net-snmp updates
* 15:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32455 and previous config saved to /var/cache/conftool/dbconfig/20220817-152643-root.json
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32454 and previous config saved to /var/cache/conftool/dbconfig/20220817-152637-root.json
* 15:24 TheresNoTime: deploying [[gerrit:824224{{!}}RESTBase is not enabled on closed wikis (T315383)]] out of window
* 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32453 and previous config saved to /var/cache/conftool/dbconfig/20220817-151139-root.json
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32452 and previous config saved to /var/cache/conftool/dbconfig/20220817-151132-root.json
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32450 and previous config saved to /var/cache/conftool/dbconfig/20220817-145634-root.json
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32449 and previous config saved to /var/cache/conftool/dbconfig/20220817-145628-root.json
* 14:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host graphite2004.codfw.wmnet with OS bullseye
* 14:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch2002.mgmt.codfw.wmnet with reboot policy FORCED
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32447 and previous config saved to /var/cache/conftool/dbconfig/20220817-144129-root.json
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32446 and previous config saved to /var/cache/conftool/dbconfig/20220817-144123-root.json
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
* 14:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
* 14:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-stretch2002.mgmt.codfw.wmnet with reboot policy FORCED
* 14:18 marostegui: Redact new wikis guwwiktionary pcmwiki bjnwiktionary [[phab:T312214|T312214]] [[phab:T310879|T310879]] [[phab:T309056|T309056]]
* 14:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch2001.mgmt.codfw.wmnet with reboot policy FORCED
* 14:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host graphite2004.codfw.wmnet with OS bullseye
* 14:01 taavi: UTC afternoon deploys done
* 14:00 taavi@deploy1002: Finished scap: Backport for [[gerrit:823697]] Add wgDiscussionToolsEnablePermalinksBackend config (duration: 19m 24s)
* 13:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-stretch2001.mgmt.codfw.wmnet with reboot policy FORCED
* 13:51 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-stretch2002
* 13:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-stretch2002
* 13:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-stretch2001
* 13:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-stretch2001
* 13:41 taavi@deploy1002: Started scap: Backport for [[gerrit:823697]] Add wgDiscussionToolsEnablePermalinksBackend config
* 13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:824128{{!}}Enable Realtime Preview on Group 1 (T314182)]] (duration: 03m 26s)
* 13:32 taavi@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/DiscussionTools/includes/Hooks/DataUpdatesHooks.php: Backport: [[gerrit:823640{{!}}Add try…catch in failing deferred update (T315383)]] (duration: 03m 18s)
* 13:27 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: lots of DiscussionTools and other changes (duration: 03m 11s)
* 13:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics@141f179]: (no justification provided) (duration: 00m 10s)
* 13:19 mforns@deploy1002: Started deploy [airflow-dags/analytics@141f179]: (no justification provided)
* 12:39 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache ([[phab:T310776|T310776]], [[phab:T312209|T312209]], [[phab:T309054|T309054]]) (duration: 03m 30s)
* 12:30 urbanecm@deploy1002: Synchronized dblists-index.php: Creating bjnwiktionary ([[phab:T312209|T312209]]) (duration: 03m 32s)
* 12:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating bjnwiktionary ([[phab:T312209|T312209]]) (duration: 03m 13s)
* 12:23 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating bjnwiktionary ([[phab:T312209|T312209]]) (duration: 03m 19s)
* 12:20 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating bjnwiktionary ([[phab:T312209|T312209]]) (duration: 03m 27s)
* 12:17 jbond: remove prometheus-ipmi-exporter from stretch
* 12:16 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating bjnwiktionary ([[phab:T312209|T312209]])
* 12:15 jbond: copy prometheus-ipmi-exporter package from buster to stretch
* 12:12 urbanecm@deploy1002: Synchronized dblists: Creating bjnwiktionary ([[phab:T312209|T312209]]) (duration: 03m 33s)
* 12:09 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating bjnwiktionary ([[phab:T312209|T312209]]) (duration: 03m 29s)
* 12:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiktionary ([[phab:T309054|T309054]]) (duration: 03m 34s)
* 12:01 jbond: copy prometheus-ipmi-exporter package from bullseye to buster
* 11:58 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiktionary ([[phab:T309054|T309054]]) (duration: 03m 43s)
* 11:54 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiktionary ([[phab:T309054|T309054]]) (duration: 03m 25s)
* 11:51 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiktionary ([[phab:T309054|T309054]])
* 11:47 urbanecm@deploy1002: Synchronized dblists: Creating guwwiktionary ([[phab:T309054|T309054]]) (duration: 03m 11s)
* 11:44 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiktionary ([[phab:T309054|T309054]]) (duration: 03m 08s)
* 11:38 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
* 11:38 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
* 11:38 urbanecm@deploy1002: Synchronized langlist: Creating pcmwiki ([[phab:T310776|T310776]]) (duration: 03m 42s)
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating pcmwiki ([[phab:T310776|T310776]]) (duration: 03m 18s)
* 11:31 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating pcmwiki ([[phab:T310776|T310776]]) (duration: 03m 24s)
* 11:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating pcmwiki ([[phab:T310776|T310776]]) (duration: 03m 13s)
* 11:24 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating pcmwiki ([[phab:T310776|T310776]])
* 11:20 urbanecm@deploy1002: Synchronized dblists: Creating pcmwiki ([[phab:T310776|T310776]]) (duration: 03m 13s)
* 11:17 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating pcmwiki ([[phab:T310776|T310776]]) (duration: 03m 22s)
* 11:11 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 11:11 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32444 and previous config saved to /var/cache/conftool/dbconfig/20220817-092244-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32443 and previous config saved to /var/cache/conftool/dbconfig/20220817-092125-root.json
* 09:10 hashar: Upgraded Gerrit from 3.4.4 to 3.4.5 # [[phab:T315408|T315408]]
* 09:09 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit1001 # [[phab:T315408|T315408]] (duration: 00m 09s)
* 09:09 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit1001 # [[phab:T315408|T315408]]
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32442 and previous config saved to /var/cache/conftool/dbconfig/20220817-090739-root.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32441 and previous config saved to /var/cache/conftool/dbconfig/20220817-090620-root.json
* 09:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit 2002 # [[phab:T315408|T315408]] (duration: 00m 11s)
* 09:03 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit 2002 # [[phab:T315408|T315408]]
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32440 and previous config saved to /var/cache/conftool/dbconfig/20220817-085235-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32439 and previous config saved to /var/cache/conftool/dbconfig/20220817-085224-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32438 and previous config saved to /var/cache/conftool/dbconfig/20220817-085136-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32437 and previous config saved to /var/cache/conftool/dbconfig/20220817-085115-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32436 and previous config saved to /var/cache/conftool/dbconfig/20220817-083730-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32435 and previous config saved to /var/cache/conftool/dbconfig/20220817-083719-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32434 and previous config saved to /var/cache/conftool/dbconfig/20220817-083631-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32433 and previous config saved to /var/cache/conftool/dbconfig/20220817-083611-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32432 and previous config saved to /var/cache/conftool/dbconfig/20220817-082226-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32431 and previous config saved to /var/cache/conftool/dbconfig/20220817-082215-root.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32430 and previous config saved to /var/cache/conftool/dbconfig/20220817-082127-root.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32429 and previous config saved to /var/cache/conftool/dbconfig/20220817-082106-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32428 and previous config saved to /var/cache/conftool/dbconfig/20220817-080721-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32427 and previous config saved to /var/cache/conftool/dbconfig/20220817-080710-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32426 and previous config saved to /var/cache/conftool/dbconfig/20220817-080622-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32425 and previous config saved to /var/cache/conftool/dbconfig/20220817-080602-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 2%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32424 and previous config saved to /var/cache/conftool/dbconfig/20220817-075216-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32423 and previous config saved to /var/cache/conftool/dbconfig/20220817-075206-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32422 and previous config saved to /var/cache/conftool/dbconfig/20220817-075118-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32421 and previous config saved to /var/cache/conftool/dbconfig/20220817-075057-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32420 and previous config saved to /var/cache/conftool/dbconfig/20220817-073712-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32419 and previous config saved to /var/cache/conftool/dbconfig/20220817-073701-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32418 and previous config saved to /var/cache/conftool/dbconfig/20220817-073613-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32417 and previous config saved to /var/cache/conftool/dbconfig/20220817-073553-root.json
* 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32416 and previous config saved to /var/cache/conftool/dbconfig/20220817-073141-ladsgroup.json
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32415 and previous config saved to /var/cache/conftool/dbconfig/20220817-073052-ladsgroup.json
* 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32414 and previous config saved to /var/cache/conftool/dbconfig/20220817-071546-ladsgroup.json
* 07:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32413 and previous config saved to /var/cache/conftool/dbconfig/20220817-070040-ladsgroup.json
* 06:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32412 and previous config saved to /var/cache/conftool/dbconfig/20220817-064534-ladsgroup.json
* 06:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 06:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 06:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
* 06:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 06:36 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 06:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
* 06:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 06:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 06:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 06:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 06:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
* 06:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
* 06:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
* 06:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
* 06:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
* 06:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
* 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
* 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
* 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
* 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
* 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
* 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
* 06:00 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 05:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 05:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 05:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
* 05:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
* 05:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1030.eqiad.wmnet with OS bullseye
* 05:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
* 05:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
* 05:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 05:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 04:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 04:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bullseye
* 04:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
* 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 04:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 04:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
* 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 04:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 04:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 04:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS bullseye
* 04:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1030.eqiad.wmnet with OS bullseye
* 04:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
* 04:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
* 04:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS bullseye
* 04:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[1051-1052].eqiad.wmnet
* 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:45 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
* 02:32 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1051-1052].eqiad.wmnet
* 02:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic[1051-1052].eqiad.wmnet
* 02:16 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1051-1052].eqiad.wmnet
* 02:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[1049-1050].eqiad.wmnet
* 02:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:59 sbassett: Re-deployed security fix for [[phab:T309894|T309894]] to wmf.25
* 01:54 sbassett: Re-deployed security fix for [[phab:T309894|T309894]] to wmf.23
* 01:49 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
* 01:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2005']
* 01:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
* 01:12 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1049-1050].eqiad.wmnet
* 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
* 00:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2004']
* 00:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2004']
* 00:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging2004']
* 00:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2004']
* 00:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
* 00:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED


== 2021-01-15 ==
== 2022-08-16 ==
* 23:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
* 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
* 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2004.mgmt.codfw.wmnet with reboot policy FORCED
* 21:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5002.eqsin.wmnet with reason: REIMAGE
* 23:44 mutante: phab1001 - repeated rsync of /srv/repos to phab2002, then chown -R phd /srv/repos/ (without setting the group) - this way UID is fixed and privs match exactly phab1001 - [[phab:T313360|T313360]]
* 21:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5002.eqsin.wmnet with reason: REIMAGE
* 23:37 mutante: phab2002 - chown -R phd:www-data /srv/repos/ (because of UID mismatch) [[phab:T313360|T313360]]
* 20:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (3/3; [[phab:T272075|T272075]]) (duration: 00m 55s)
* 23:32 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2004.mgmt.codfw.wmnet with reboot policy FORCED
* 20:37 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-tagline-fr-20.svg: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (2/3; [[phab:T272075|T272075]]) (duration: 00m 55s)
* 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['graphite2004']
* 20:36 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-fr-20.svg: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (1/3; [[phab:T272075|T272075]]) (duration: 00m 58s)
* 23:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
* 20:21 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-fr-20.svg: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (1/3; [[phab:T272075|T272075]]) (duration: 01m 54s)
* 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2005
* 17:17 legoktm: legoktm@contint2001:~$ sudo systemctl reload apache2 # for [[phab:T272159|T272159]]
* 23:27 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2005
* 16:17 bstorm: canceled downtime for maintain-dbusers on labstore1004 [[phab:T272127|T272127]]
* 23:27 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004
* 15:30 elukey: restart archiva to apply hot-fix for [[phab:T272082|T272082]]
* 23:26 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1002.wikimedia.org
* 23:24 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1002.wikimedia.org
* 23:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['graphite2004']
* 15:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1001.wikimedia.org
* 23:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
* 15:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1001.wikimedia.org
* 23:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['graphite2004']
* 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 23:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
* 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 23:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2003.wikimedia.org
* 23:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bullseye
* 14:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2003.wikimedia.org
* 23:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
* 14:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2004.wikimedia.org
* 23:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
* 14:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2004.wikimedia.org
* 23:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite2004.mgmt.codfw.wmnet with reboot policy FORCED
* 11:30 jynus: rolling restart of eqiad source backup dbs
* 22:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
* 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
* 22:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
* 11:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
* 22:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bullseye
* 11:11 XioNoX: update cloud-in4 firewall rules
* 22:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2036.codfw.wmnet
* 22:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
* 22:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 10:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
* 22:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:56 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc2036.codfw.wmnet
* 22:29 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:55 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
* 22:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Large deletions affecting this replica
* 10:53 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
* 22:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Large deletions affecting this replica
* 10:53 vgutierrez: re-enable puppet on acme-chief clients
* 22:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
* 10:53 jynus: rolling restart of dbprov2* hosts
* 22:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
* 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
* 21:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:52 _joe_: rebuilding the docker images coredns,nutcracker,prometheus-statsd-exporter,service-checker,wmfdebug to use wikimedia-buster as a base
* 21:56 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
* 21:54 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]]
* 10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
* 21:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 10:46 vgutierrez: disable puppet on acme-chief clients
* 21:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 10:45 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
* 21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic1048.eqiad.wmnet
* 10:43 effie: reboot mc2036 - [[phab:T269596|T269596]]
* 21:53 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
* 21:47 bking@cumin1001: START - Cookbook sre.dns.netbox
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
* 21:45 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1075.eqiad.wmnet with OS bullseye
* 10:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
* 21:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
* 21:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
* 21:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:07 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
* 21:44 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 10:02 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 21:42 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic1048.eqiad.wmnet
* 09:58 reedy@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: [[phab:T272103|T272103]] (duration: 00m 57s)
* 21:41 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic1048.eqiad.wmnet
* 09:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:31 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic1048.eqiad.wmnet
* 09:36 vgutierrez: rolling restart acme-chief servers to catch up on kernel upgrades
* 21:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host graphite2004.mgmt.codfw.wmnet with reboot policy FORCED
* 09:24 jynus: rolling restart of dbprov1* hosts
* 21:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:19 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 21:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1075.eqiad.wmnet with reason: host reimage
* 09:07 moritzm: installing bast5002 [[phab:T257324|T257324]]
* 21:27 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host graphite2004
* 08:45 moritzm: installing bast4003 [[phab:T257324|T257324]]
* 21:26 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host graphite2004
* 08:39 marostegui: Restart clouddb1013-clouddb1020
* 21:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1075.eqiad.wmnet with reason: host reimage
* 08:28 ryankemper: WDQS puppet run successful
* 21:25 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 08:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:13 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1075.eqiad.wmnet with OS bullseye
* 08:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:11 cstone: civicrm upgraded from {{Gerrit|92467234}} to {{Gerrit|c228e3d7}}
* 08:01 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:05 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@33afb85]: initial scap deploy to an-airflow1004, take 3 - [[phab:T312858|T312858]] (duration: 00m 18s)
* 07:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:05 otto@deploy1002: Started deploy [airflow-dags/platform_eng@33afb85]: initial scap deploy to an-airflow1004, take 3 - [[phab:T312858|T312858]]
* 07:57 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:01 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]] (duration: 08m 02s)
* 03:49 eileen: civicrm revision changed from {{Gerrit|f417a510a5}} to {{Gerrit|4220fc8177}}, config revision is {{Gerrit|f08249ecf9}}
* 20:54 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@da511ee]: initial scap deploy to an-airflow1004, take 2 - [[phab:T312858|T312858]] (duration: 01m 05s)
* 20:53 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]]
* 20:53 otto@deploy1002: Started deploy [airflow-dags/platform_eng@da511ee]: initial scap deploy to an-airflow1004, take 2 - [[phab:T312858|T312858]]
* 20:47 cjming: end of UTC late backport window
* 20:45 cjming@deploy1002: Finished scap: Backport for [[gerrit:823268]] Update sticky header config for idwiki, viwiki A/B experiment (duration: 06m 44s)
* 20:42 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@eba3ff8]: initial scap deploy to an-airflow1004 - [[phab:T312858|T312858]] (duration: 02m 30s)
* 20:39 otto@deploy1002: Started deploy [airflow-dags/platform_eng@eba3ff8]: initial scap deploy to an-airflow1004 - [[phab:T312858|T312858]]
* 20:39 cjming@deploy1002: Started scap: Backport for [[gerrit:823268]] Update sticky header config for idwiki, viwiki A/B experiment
* 20:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1055.eqiad.wmnet with OS bullseye
* 20:28 cjming@deploy1002: Finished scap: Backport for [[gerrit:823658]] mediawikiwiki: set $wgCdnMatchParameterOrder to false (duration: 08m 54s)
* 20:26 mutante: mw1406 - sudo systemctl start php7.2-fpm_check_restart
* 20:19 cjming@deploy1002: Started scap: Backport for [[gerrit:823658]] mediawikiwiki: set $wgCdnMatchParameterOrder to false
* 20:18 ori: removed /var/lock/scap.operations_mediawiki-config.lock on deploy1002
* 20:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1055.eqiad.wmnet with reason: host reimage
* 20:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 20:14 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1055.eqiad.wmnet with reason: host reimage
* 20:13 cjming@deploy1002: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "demon"; reason is "all wikis to 1.39.0-wmf.23  refs [[phab:T314186|T314186]]" (duration: 00m 00s)
* 19:58 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1055.eqiad.wmnet with OS bullseye
* 19:53 dancy@deploy1002: backport aborted:  (duration: 00m 36s)
* 19:53 demon@deploy1002: stage-train aborted:  (duration: 07m 00s)
* 19:53 demon@deploy1002: deploy-promote aborted:  (duration: 05m 35s)
* 19:53 demon@deploy1002: sync-world aborted: testwikis wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]] (duration: 03m 39s)
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32408 and previous config saved to /var/cache/conftool/dbconfig/20220816-195115-ladsgroup.json
* 19:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 19:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 19:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32407 and previous config saved to /var/cache/conftool/dbconfig/20220816-195043-ladsgroup.json
* 19:49 demon@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.25  refs [[phab:T314186|T314186]]
* 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32406 and previous config saved to /var/cache/conftool/dbconfig/20220816-193537-ladsgroup.json
* 19:25 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 19:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 19:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
* 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32405 and previous config saved to /var/cache/conftool/dbconfig/20220816-192031-ladsgroup.json
* 19:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 19:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 19:13 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e]: Full deploy after last week's interrupted deployment.  This syncs the latest refinery to all targets.  an-launcher1002 already has these files. (duration: 24m 46s)
* 19:07 demon@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.23  refs [[phab:T314186|T314186]]
* 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32404 and previous config saved to /var/cache/conftool/dbconfig/20220816-190525-ladsgroup.json
* 19:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 19:04 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 19:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1076.eqiad.wmnet with OS bullseye
* 18:58 demon@deploy1002: Pruned MediaWiki: 1.39.0-wmf.22 (duration: 02m 02s)
* 18:56 demon@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.24  refs [[phab:T314186|T314186]] (duration: 35m 39s)
* 18:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 18:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 18:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 18:48 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e]: Full deploy after last week's interrupted deployment.  This syncs the latest refinery to all targets.  an-launcher1002 already has these files.
* 18:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1076.eqiad.wmnet with reason: host reimage
* 18:40 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 18:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 18:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1076.eqiad.wmnet with reason: host reimage
* 18:37 jynus: restore x2 codfw replication [[phab:T315271|T315271]]
* 18:26 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1076.eqiad.wmnet with OS bullseye
* 18:20 demon@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.24  refs [[phab:T314186|T314186]]
* 18:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 18:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1078.eqiad.wmnet with OS bullseye
* 17:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 17:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1078.eqiad.wmnet with reason: host reimage
* 17:40 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1078.eqiad.wmnet with reason: host reimage
* 17:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 17:27 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1078.eqiad.wmnet with OS bullseye
* 17:00 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 16:57 ryankemper: [WDQS] `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph`
* 16:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1049.eqiad.wmnet with OS bullseye
* 16:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1049.eqiad.wmnet with reason: host reimage
* 16:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1049.eqiad.wmnet with reason: host reimage
* 16:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1049.eqiad.wmnet with OS bullseye
* 16:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:02 btullis@deploy1002: Finished deploy [airflow-dags/analytics@3c998da]: (no justification provided) (duration: 00m 12s)
* 16:02 btullis@deploy1002: Started deploy [airflow-dags/analytics@3c998da]: (no justification provided)
* 15:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
* 15:48 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
* 15:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1074.eqiad.wmnet with OS bullseye
* 15:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
* 15:29 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
* 15:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1074.eqiad.wmnet with reason: host reimage
* 15:23 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1074.eqiad.wmnet with reason: host reimage
* 15:12 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
* 15:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1074.eqiad.wmnet with OS bullseye
* 15:07 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
* 15:07 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
* 15:07 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
* 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
* 14:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1077.eqiad.wmnet with OS bullseye
* 14:26 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
* 14:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1077.eqiad.wmnet with reason: host reimage
* 14:10 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1077.eqiad.wmnet with reason: host reimage
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:57 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1077.eqiad.wmnet with OS bullseye
* 13:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1057.eqiad.wmnet with OS bullseye
* 13:55 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Config: [[gerrit:823148{{!}}jawiki: Restrict abusefilter log view to "abusefilter-modify" user (T315199)]] (duration: 03m 12s)
* 13:41 taavi: UTC afternoon deploys done
* 13:40 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:823148{{!}}jawiki: Restrict abusefilter log view to "abusefilter-modify" user (T315199)]] (duration: 03m 21s)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 13:38 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 13:38 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=1)
* 13:38 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1057.eqiad.wmnet with reason: host reimage
* 13:33 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1057.eqiad.wmnet with reason: host reimage
* 13:24 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
* 13:24 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 13:24 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 13:24 taavi@deploy1002: Synchronized wmf-config: Config: [[gerrit:822718{{!}}kowiki: Change logo for 600k articles (T315127)]] (duration: 03m 11s)
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:20 taavi@deploy1002: Synchronized static/images: Config: [[gerrit:822717{{!}}kowiki: Add logo (legacy vector and vector-2022) for 600k articles (T315127)]] (duration: 03m 29s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1057.eqiad.wmnet with OS bullseye
* 13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 13:04 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
* 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 11:24 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 11:24 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 11:08 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
* 11:03 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 11:02 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 10:53 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
* 10:50 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
* 10:49 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
* 10:49 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
* 10:40 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 10:38 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 10:34 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
* 10:30 jelto: reimaging gitlab2003 (insetup) to test partman recipe from gerrit:823115 - [[phab:T274463|T274463]]
* 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32402 and previous config saved to /var/cache/conftool/dbconfig/20220816-074259-ladsgroup.json
* 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32401 and previous config saved to /var/cache/conftool/dbconfig/20220816-074239-ladsgroup.json
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32400 and previous config saved to /var/cache/conftool/dbconfig/20220816-072733-ladsgroup.json
* 07:26 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2067.codfw.wmnet
* 07:26 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2067.codfw.wmnet
* 07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
* 07:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32399 and previous config saved to /var/cache/conftool/dbconfig/20220816-071227-ladsgroup.json
* 06:58 hashar@deploy1002: Finished deploy [integration/docroot@c142ba7]: Drop archived wikibase-vuejs-components storybook - [[phab:T309872|T309872]] (duration: 00m 10s)
* 06:58 hashar@deploy1002: Started deploy [integration/docroot@c142ba7]: Drop archived wikibase-vuejs-components storybook - [[phab:T309872|T309872]]
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32398 and previous config saved to /var/cache/conftool/dbconfig/20220816-065721-ladsgroup.json
* 06:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
* 06:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
* 06:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P32397 and previous config saved to /var/cache/conftool/dbconfig/20220816-062955-ladsgroup.json
* 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maint work on old s1 master ([[phab:T312984|T312984]] [[phab:T312863|T312863]] [[phab:T310011|T310011]] [[phab:T309311|T309311]] [[phab:T60674|T60674]] [[phab:T298560|T298560]] [[phab:T298555|T298555]] [[phab:T310485|T310485]] [[phab:T301312|T301312]])
* 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maint work on old s1 master ([[phab:T312984|T312984]] [[phab:T312863|T312863]] [[phab:T310011|T310011]] [[phab:T309311|T309311]] [[phab:T60674|T60674]] [[phab:T298560|T298560]] [[phab:T298555|T298555]] [[phab:T310485|T310485]] [[phab:T301312|T301312]])
* 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 [[phab:T314380|T314380]]', diff saved to https://phabricator.wikimedia.org/P32396 and previous config saved to /var/cache/conftool/dbconfig/20220816-061413-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write [[phab:T314380|T314380]]', diff saved to https://phabricator.wikimedia.org/P32395 and previous config saved to /var/cache/conftool/dbconfig/20220816-060530-ladsgroup.json
* 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - [[phab:T314380|T314380]]', diff saved to https://phabricator.wikimedia.org/P32394 and previous config saved to /var/cache/conftool/dbconfig/20220816-060455-ladsgroup.json
* 06:04 Amir1: Starting s1 eqiad failover from db1163 to db1118 - [[phab:T314380|T314380]]
* 05:43 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(appservers{{!}}api)-ro
* 05:43 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(appservers{{!}}api)-ro
* 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 [[phab:T314380|T314380]]', diff saved to https://phabricator.wikimedia.org/P32393 and previous config saved to /var/cache/conftool/dbconfig/20220816-053534-ladsgroup.json
* 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s1 [[phab:T314380|T314380]]
* 05:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s1 [[phab:T314380|T314380]]
* 05:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db[2142-2143].codfw.wmnet with reason: After-canary
* 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db[2142-2143].codfw.wmnet with reason: After-canary
* 04:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1059.eqiad.wmnet with OS bullseye
* 04:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1059.eqiad.wmnet with reason: host reimage
* 04:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1059.eqiad.wmnet with reason: host reimage
* 03:57 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1059.eqiad.wmnet with OS bullseye
* 03:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 03:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 03:53 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 01:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
* 00:18 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: replaceableSettings g 820247 (duration: 03m 18s)
* 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:13 tstarling@deploy1002: Synchronized tests: config tests, for consistency g 820247 (duration: 03m 22s)
* 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-01-14 ==
== 2022-08-15 ==
* 23:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2236.codfw.wmnet
* 23:20 mutante: phab2002 - manually removing service IP addresses for git-ssh.codfw.wikimedia.org which were added by puppet even after gerrit:823220 (!) [[phab:T280597|T280597]]
* 23:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T272094|T272094]] Change enwiki logo to 20th Birthday Celebration one (duration: 00m 56s)
* 22:59 mutante: search-loader1001 - killed puppet process that had been running since May
* 23:11 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20-2x.png: [[phab:T272094|T272094]] Sync out logo before going live, 3/3 (duration: 00m 55s)
* 22:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
* 23:09 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20-1.5x.png: [[phab:T272094|T272094]] Sync out logo before going live, 2/3 (duration: 00m 55s)
* 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
* 23:07 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20.png: [[phab:T272094|T272094]] Sync out logo before going live, 1/3 (duration: 01m 02s)
* 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 23:02 mutante: Happy 20th Birthday Wikipedia - https://20.wikipedia.org - https://gerrit.wikimedia.org/r/656268
* 22:33 mutante: rsyncing /srv/repos and /srv/dumps from phab1001 to phab2002 before applying prod puppet role ([[phab:T313360|T313360]])
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2236.codfw.wmnet
* 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1083.eqiad.wmnet with OS bullseye
* 22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2270.codfw.wmnet
* 21:54 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:823229{{!}}Revert "Revert "Enable sticky header edit A/B test for idwiki + viwiki""]] (duration: 03m 37s)
* 22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2268.codfw.wmnet
* 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2269.codfw.wmnet
* 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2269.codfw.wmnet
* 21:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2270.codfw.wmnet
* 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2268.codfw.wmnet
* 21:45 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1083.eqiad.wmnet with reason: host reimage
* 22:04 thcipriani: restart apache on gerrit1001
* 21:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2236.codfw.wmnet with reason: REIMAGE
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2236.codfw.wmnet with reason: REIMAGE
* 21:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2270.codfw.wmnet with reason: REIMAGE
* 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1083.eqiad.wmnet with reason: host reimage
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2269.codfw.wmnet with reason: REIMAGE
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2270.codfw.wmnet with reason: REIMAGE
* 21:42 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.es6: Backport: [[gerrit:823228{{!}}Sticky header AB test bucketing for 2 treatment buckets (T312573)]] (duration: 03m 05s)
* 21:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE
* 21:34 ejegg: payments-wiki upgraded from {{Gerrit|41709763}} to {{Gerrit|f9f91f1f}}
* 21:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2269.codfw.wmnet with reason: REIMAGE
* afk: payments-wiki rolled back to {{Gerrit|41709763}}
* 21:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE
* 21:29 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1083.eqiad.wmnet with OS bullseye
* 21:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2258.codfw.wmnet
* 21:22 ejegg: payments-wiki upgraded from {{Gerrit|41709763}} to {{Gerrit|f9f91f1f}}
* 21:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet
* 21:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1080.eqiad.wmnet with OS bullseye
* 21:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:18 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2242.codfw.wmnet
* 20:55 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:823227{{!}}Revert "Enable sticky header edit A/B test for idwiki + viwiki"]] (duration: 03m 15s)
* 21:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2241.codfw.wmnet
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 20:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1080.eqiad.wmnet with reason: host reimage
* 21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 20:48 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1080.eqiad.wmnet with reason: host reimage
* 21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 20:35 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1080.eqiad.wmnet with OS bullseye
* 21:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 20:33 cjming: end of UTC late backport window
* 21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2258.codfw.wmnet
* 20:31 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments: Backport: [[gerrit:822485{{!}}WelcomeSurvey/VariantHooks: Change hook used for redirection (T313064)]] (duration: 04m 37s)
* 21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2242.codfw.wmnet
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2241.codfw.wmnet
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 mutante: ACKing all unhandled crit alerts about systemd on clouddb hosts - notifications are disabled but this cleans up Icinga web UI noise - [[phab:T267090|T267090]]
* 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:821310{{!}}Enable sticky header edit A/B test for idwiki + viwiki (T312295)]] (duration: 03m 30s)
* 20:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:05 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:31 urbanecm@deploy1001: Synchronized dblists/closed.dblist: {{Gerrit|d3e274e9b953f5edda07fa3a016b7291a451ceb2}}: Close lrcwiki ([[phab:T272041|T272041]]) (duration: 00m 58s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:03 mutante: mc1024 - attempting to power on via mgmt, went down and power down
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE
* 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32391 and previous config saved to /var/cache/conftool/dbconfig/20220815-193541-ladsgroup.json
* 18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE
* 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE
* 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE
* 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32390 and previous config saved to /var/cache/conftool/dbconfig/20220815-193520-ladsgroup.json
* 18:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE
* 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32389 and previous config saved to /var/cache/conftool/dbconfig/20220815-192014-ladsgroup.json
* 18:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE
* 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32388 and previous config saved to /var/cache/conftool/dbconfig/20220815-190508-ladsgroup.json
* 18:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE
* 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32387 and previous config saved to /var/cache/conftool/dbconfig/20220815-185002-ladsgroup.json
* 18:38 Amir1: started mass deletion of lrcwiki ([[phab:T272041|T272041]]) - https://w.wiki/uPV
* 18:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1081.eqiad.wmnet with OS bullseye
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE
* 18:40 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@230a820]: include additional deubgging information in HivePartitionRangeSensor logs (duration: 02m 08s)
* 18:36 jynus: restarting backup1002, backup2002 [[phab:T271913|T271913]]
* 18:38 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@230a820]: include additional deubgging information in HivePartitionRangeSensor logs
* 18:05 jynus: restarting backup1001, backup2001 [[phab:T271913|T271913]]
* 18:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1081.eqiad.wmnet with reason: host reimage
* 16:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
* 18:31 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ms-be2067.codfw.wmnet
* 16:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
* 18:29 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1081.eqiad.wmnet with reason: host reimage
* 16:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack
* 18:24 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2067.codfw.wmnet
* 16:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack
* 18:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1081.eqiad.wmnet with OS bullseye
* 16:32 moritzm: installing php-pear updates on stretch
* 18:07 herron: thanos compact process was hung, forced thanos-compact restart on thanos-fe2001
* 16:03 moritzm: installing tomcat8 security updates
* 17:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1052.eqiad.wmnet with OS bullseye
* 15:40 moritzm: installing sqlite3 security updates on Stretch
* 17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1052.eqiad.wmnet with reason: host reimage
* 15:30 papaul: power down ms-be2022 for maintenance
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 15:19 otto@deploy1001: Finished deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - [[phab:T264358|T264358]] (duration: 02m 16s)
* 17:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
* 15:16 otto@deploy1001: Started deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - [[phab:T264358|T264358]]
* 17:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1052.eqiad.wmnet with reason: host reimage
* 15:11 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 17:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
* 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack
* 17:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack
* 17:24 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d4137b5]: increase subgraph query SLA and remove same from drop_old_data (duration: 02m 17s)
* 14:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
* 17:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d4137b5]: increase subgraph query SLA and remove same from drop_old_data
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
* 17:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1052.eqiad.wmnet with OS bullseye
* 14:56 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1082.eqiad.wmnet with OS bullseye
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1082.eqiad.wmnet with reason: host reimage
* 14:28 arturo: running homer in asw-b-codfw* ([[phab:T271519|T271519]])
* 16:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1082.eqiad.wmnet with reason: host reimage
* 14:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:32 damilare: payments-wiki upgraded from {{Gerrit|0894d75a}} to {{Gerrit|41709763}}
* 14:24 arturo: running homer in asw-b-codfw* ([[phab:T271519|T271519]])
* 16:27 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 14:10 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.26
* 16:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 14:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:23 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1082.eqiad.wmnet with OS bullseye
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:17 dancy@deploy1002: Installation of scap version "4.13.0" completed for 553 hosts
* 14:06 hashar@deploy1001: Synchronized php-1.36.0-wmf.26/skins/CologneBlue/includes/CologneBlueHooks.php: Edit link may not be present, avoid undefined index notice [[phab:T271978|T271978]] (duration: 01m 07s)
* 16:17 dancy@deploy1002: Installing scap version "4.13.0" for 553 hosts
* 13:56 aborrero@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:14 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 marostegui: Restart mysql on db2094 for openssl upgrades test
* 16:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:42 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:23 moritzm: restarting mw canaries for openssl update
* 15:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:22 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts logstash2003.codfw.wmnet
* 13:22 aborrero@cumin2001: START - Cookbook sre.dns.netbox
* 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts logstash2003.codfw.wmnet
* 13:17 moritzm: installing openssl1.0 security updates on stretch
* 15:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
* 13:15 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:32 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
* 13:11 moritzm: installing xerces-c security updates on stretch
* 15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
* 12:50 volans: upgraded python3-pynetbox to 5.3.0-1 on all affected hosts - [[phab:T266487|T266487]]
* 15:31 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
* 12:49 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
* 12:47 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 15:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
* 12:34 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
* 12:34 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 15:31 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
* 12:33 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
* 15:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1068.eqiad.wmnet with OS bullseye
* 12:29 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1068.eqiad.wmnet with reason: host reimage
* 12:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1004.eqiad.wmnet with reason: REIMAGE
* 14:36 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1068.eqiad.wmnet with reason: host reimage
* 12:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1004.eqiad.wmnet with reason: REIMAGE
* 14:26 hnowlan@deploy1002: Finished deploy [restbase/deploy@a571f9a]: Add blwiki [[phab:T310874|T310874]] (duration: 15m 42s)
* 12:24 XioNoX: push pfw3 firewall rules - [[phab:T271935|T271935]]
* 14:23 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1068.eqiad.wmnet with OS bullseye
* 12:16 volans: upgraded python3-pynetbox to 5.3.0-1 on cumin2001
* 14:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
* 12:16 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
* 14:10 hnowlan@deploy1002: Started deploy [restbase/deploy@a571f9a]: Add blwiki [[phab:T310874|T310874]]
* 12:14 elukey@cumin1001: END (ERROR) - Cookbook sre.presto.reboot-workers (exit_code=97) for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 14:10 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
* 12:14 volans: built and uploaded python3-pynetbox 5.3.0-1 to apt.wikimedia.org - [[phab:T266487|T266487]]
* 14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1070.eqiad.wmnet with OS bullseye
* 12:10 awight: EU config window finished.
* 13:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1070.eqiad.wmnet with reason: host reimage
* 12:09 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:656116{{!}}Remove unused WMDE TeWü QuickSurveys (T253112, T272013)]] (duration: 01m 07s)
* 13:46 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1070.eqiad.wmnet with reason: host reimage
* 12:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:34 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1070.eqiad.wmnet with OS bullseye
* 12:02 moritzm: rebooting miscweb1002
* 13:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 12:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|de81bcb5874aee16b23ffea5a43466572250a6c2}}: testwikidatawiki: Add wikidata as import source ([[phab:T315211|T315211]]) (duration: 03m 26s)
* 11:43 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:34 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4164318]: (no justification provided) (duration: 30m 34s)
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e2772238003b797b1a8b18b4df0aa56f54132727}}: Revert "Revert "Remove WikibaseTermboxInteraction $wgEventLoggingSchemas entry"" ([[phab:T290303|T290303]]) (duration: 03m 29s)
* 11:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:22 elukey@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 10:03 Emperor: pd 1I:1:1 modify disablepd forced on ms-be2028 [[phab:T315213|T315213]]
* 11:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:04 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4164318]: (no justification provided)
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:04 oblivian@deploy1001: deploy aborted: (no justification provided) (duration: 00m 14s)
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:03 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4164318]: (no justification provided)
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:17 urbanecm: UTC morning B&C window done
* 10:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 07:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a454d3bc56c344fa62625f7c292ea087bddfebe5}}: Pin wgCheckUserLogReasonMigrationStage to read and write old ([[phab:T233004|T233004]]) (duration: 03m 16s)
* 10:35 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:28 jbond42: failover apt.wikimedia.org back to apt1001
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:28 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
* 07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43cd5ef1bc38bdc8f46f3093cf0baa74cccc9678}}: Add bnwiki in wgImportSources to bnwikibooks ([[phab:T314820|T314820]]) (duration: 03m 05s)
* 10:25 jbond42: reboot apt1001
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:16 jbond42: failover apt.wikimedia.org to apt2001
* 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32386 and previous config saved to /var/cache/conftool/dbconfig/20220815-070955-ladsgroup.json
* 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 10:12 jbond42: reboot apt2001
* 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 10:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 09:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 09:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 07:08 urbanecm: mwscript resetAuthenticationThrottle.php --wiki=cswiki --signup --ip='194.31.191.20' # [[phab:T315141|T315141]]
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13768 and previous config saved to /var/cache/conftool/dbconfig/20210114-093803-root.json
* 07:06 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|7c2a393ee}}: {{Gerrit|dc0d62a3}}: {{Gerrit|6f687bcfc}}: Update throttle rules ([[phab:T315182|T315182]], [[phab:T315141|T315141]]) (duration: 03m 21s)
* 09:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32385 and previous config saved to /var/cache/conftool/dbconfig/20220815-023538-ladsgroup.json
* 09:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32384 and previous config saved to /var/cache/conftool/dbconfig/20220815-022032-ladsgroup.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13767 and previous config saved to /var/cache/conftool/dbconfig/20210114-092300-root.json
* 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32383 and previous config saved to /var/cache/conftool/dbconfig/20220815-020526-ladsgroup.json
* 09:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32382 and previous config saved to /var/cache/conftool/dbconfig/20220815-015020-ladsgroup.json
* 09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:11 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13766 and previous config saved to /var/cache/conftool/dbconfig/20210114-090756-root.json
* 09:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13765 and previous config saved to /var/cache/conftool/dbconfig/20210114-085252-root.json
* 08:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:51 vgutierrez: rolling restart of ncredir servers to catch up on kernel upgrades
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:43 XioNoX: standardize cloudsw interfaces to prepare for switches homerisation
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2140 [[phab:T271084|T271084]]', diff saved to https://phabricator.wikimedia.org/P13764 and previous config saved to /var/cache/conftool/dbconfig/20210114-084243-marostegui.json
* 08:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:10 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 08:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 00:22 ryankemper: [[phab:T266492|T266492]] Restart of `relforge` successful
* 00:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 00:15 chaomodus: completed rebooting Netbox hosts, failure was due to report errors that would not have recovered.
* 00:14 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 00:13 ryankemper: `sudo -i cookbook sre.elasticsearch.rolling-restart relforge "relforge cluster restart" --task-id [[phab:T266492|T266492]] --nodes-per-run 1 --without-lvs`
* 00:13 ryankemper: (Forgot to tell it `relforge` isn't lvs-managed)
* 00:13 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 00:10 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 00:10 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of `relforge`
* 00:09 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2239.codfw.wmnet
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2238.codfw.wmnet
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2237.codfw.wmnet
* 00:01 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 00:01 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 00:00 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] Rolling restart of `cloudelastic` was successful


== 2021-01-13 ==
== 2022-08-14 ==
* 23:53 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32380 and previous config saved to /var/cache/conftool/dbconfig/20220814-085443-ladsgroup.json
* 23:53 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 23:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 08:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 23:49 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:49 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 23:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:46 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 23:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:44 chaomodus: rebooting Netbox instances to apply updates
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2240.codfw.wmnet
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2239.codfw.wmnet
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2238.codfw.wmnet
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2237.codfw.wmnet
* 22:53 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 22:53 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] `sudo -i cookbook sre.elasticsearch.rolling-restart cloudelastic "cloudelastic cluster restart" --task-id [[phab:T266492|T266492]] --nodes-per-run 1`
* 22:53 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] Restarting cloudelastic to apply new readahead changes, this will also verify cloudelastic support works in our elasticsearch spicerack code. Only going one node at a time because cloudelastic elasticsearch indices only have 1 replica shard per index
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2239.codfw.wmnet with reason: new install on buster
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2239.codfw.wmnet with reason: new install on buster
* 21:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2240.codfw.wmnet with reason: REIMAGE
* 21:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2238.codfw.wmnet with reason: REIMAGE
* 21:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2240.codfw.wmnet with reason: REIMAGE
* 21:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2239.codfw.wmnet with reason: REIMAGE
* 21:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2237.codfw.wmnet with reason: REIMAGE
* 21:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2239.codfw.wmnet with reason: REIMAGE
* 21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2238.codfw.wmnet with reason: REIMAGE
* 21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2237.codfw.wmnet with reason: REIMAGE
* 21:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2235.codfw.wmnet
* 21:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet
* 21:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet
* 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2232.codfw.wmnet
* 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2231.codfw.wmnet
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2235.codfw.wmnet
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2234.codfw.wmnet
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2233.codfw.wmnet
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2232.codfw.wmnet
* 20:40 mutante: DNS - new project language "alt" added.  Altai (also Gorno-Altai) is a Turkic language, spoken officially in the Altai Republic, Russia.
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2235.codfw.wmnet with reason: REIMAGE
* 20:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 20:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2235.codfw.wmnet with reason: REIMAGE
* 20:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 20:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 20:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 19:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|726e972bc8cff1ff8ed90c8dd853aae4997329f5}}: Set import sources for mrwikibooks ([[phab:T270402|T270402]]) (duration: 01m 04s)
* 19:47 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/WikibaseMediaInfo/src/Search/MediaSearchProfiles.php: [[gerrit:655919{{!}}Guard against this file being included twice]] [[phab:T271933|T271933]] (for real -- forgot to submodule update) (duration: 01m 04s)
* 19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2234.codfw.wmnet with reason: REIMAGE
* 19:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2234.codfw.wmnet with reason: REIMAGE
* 19:42 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/WikibaseMediaInfo/src/Search/MediaSearchProfiles.php: [[gerrit:655919{{!}}Guard against this file being included twice]] [[phab:T271933|T271933]] (duration: 01m 04s)
* 19:39 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid test cluster: Reboot Druid nodes - razzi@cumin1001
* 19:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undo - Migrate SpecialMuteSubmit to EventGate - [[phab:T268517|T268517]] (duration: 01m 06s)
* 19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2231.codfw.wmnet
* 19:20 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes - razzi@cumin1001
* 19:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2230.codfw.wmnet
* 19:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2229.codfw.wmnet
* 19:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2228.codfw.wmnet
* 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2233.codfw.wmnet with reason: REIMAGE
* 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2233.codfw.wmnet with reason: REIMAGE
* 18:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2232.codfw.wmnet with reason: REIMAGE
* 18:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2231.codfw.wmnet with reason: REIMAGE
* 18:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2232.codfw.wmnet with reason: REIMAGE
* 18:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2231.codfw.wmnet with reason: REIMAGE
* 18:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2227.codfw.wmnet
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2228.codfw.wmnet
* 18:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
* 18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
* 18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
* 18:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2227.codfw.wmnet
* 17:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2228.codfw.wmnet with reason: REIMAGE
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2228.codfw.wmnet with reason: REIMAGE
* 17:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2230.codfw.wmnet with reason: REIMAGE
* 17:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2230.codfw.wmnet with reason: REIMAGE
* 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2229.codfw.wmnet with reason: REIMAGE
* 17:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2227.codfw.wmnet with reason: REIMAGE
* 17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2229.codfw.wmnet with reason: REIMAGE
* 17:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2227.codfw.wmnet with reason: REIMAGE
* 17:11 herron: beginning cutover of https://logstash.wikimedia.org frontend to ELK7 [[phab:T234854|T234854]]
* 17:02 mutante: m2228 resetting DRAC/BMC - trying to solve remote IPMI issue - bmc-device --cold-reset; echo $?
* 17:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:39 sukhe: upload pdns-recursor_4.4.2-2wm1 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 16:18 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:18 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:18 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:17 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:06 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/ProofreadPage/includes/Special/SpecialProofreadPages.php: {{Gerrit|d73ba7c1aa92190903cd4b07fe3e8cf1bed13d70}}: GlobalVarConfig::get should not be provided with the wg prefix ([[phab:T271932|T271932]]) (duration: 01m 07s)
* 15:56 volans: upgraded spicerack to 0.0.47-1+deb10u1 on cumin1001 - [[phab:T257905|T257905]]
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 15:45 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:45 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:22 hashar: Stopping Jenkins CI on contint2001 to upgrade Jenkins # [[phab:T271507|T271507]]
* 15:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 15:06 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
* 15:06 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
* 15:05 volans: upgraded spicerack to 0.0.47-1+deb10u1 on cumin2001 - [[phab:T257905|T257905]]
* 15:01 hashar: Upgraded Jenkins on releases1002 / releases2002 hosts # [[phab:T271507|T271507]]
* 14:57 moritzm: imported jenkins 2.263.2 (security release) to apt.wikimedia.org/buster-wikimedia
* 14:27 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.26/skins/Vector/includes/templates/legacy/Sidebar.mustache: {{Gerrit|5a117ded68b5e0fc7f9b4a8a4513780e57eceefe}}: Use {{link-mainpage}} in legacy sidebar same as new logo ([[phab:T271873|T271873]]) (duration: 01m 05s)
* 14:17 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:17 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:15 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.26 (duration: 01m 03s)
* 14:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.26
* 13:52 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:52 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:49 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:49 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:48 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 13:36 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:36 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:31 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:31 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:15 dcausse: European mid-day backport window done
* 12:09 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T239931|T239931]]: Revert "Disable sanity check cirrus jobs for Wikidata" (duration: 01m 16s)
* 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 11:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1029.eqiad.wmnet with reason: REIMAGE
* 11:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 11:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1029.eqiad.wmnet with reason: REIMAGE
* 11:40 kart_: Updated cxserver to 2021-01-12-095820-production ([[phab:T234220|T234220]], [[phab:T270408|T270408]])
* 11:37 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:33 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:23 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13756 and previous config saved to /var/cache/conftool/dbconfig/20210113-111312-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight on es4 the master', diff saved to https://phabricator.wikimedia.org/P13755 and previous config saved to /var/cache/conftool/dbconfig/20210113-110419-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13754 and previous config saved to /var/cache/conftool/dbconfig/20210113-105809-root.json
* 10:57 volans: uploaded spicerack_0.0.47 to apt.wikimedia.org buster-wikimedia
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13753 and previous config saved to /var/cache/conftool/dbconfig/20210113-104305-root.json
* 10:35 jbond42: puppet re-enabled on aall cp-text hosts
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13751 and previous config saved to /var/cache/conftool/dbconfig/20210113-102802-root.json
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce weight on es1021', diff saved to https://phabricator.wikimedia.org/P13750 and previous config saved to /var/cache/conftool/dbconfig/20210113-102245-marostegui.json
* 10:18 jbond42: disable puppet on the cp::text to deploy block list changes 651174 + 651171
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020', diff saved to https://phabricator.wikimedia.org/P13749 and previous config saved to /var/cache/conftool/dbconfig/20210113-101606-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13748 and previous config saved to /var/cache/conftool/dbconfig/20210113-100253-root.json
* 09:59 marostegui: Enable report_host on es1020 [[phab:T271106|T271106]]
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020', diff saved to https://phabricator.wikimedia.org/P13747 and previous config saved to /var/cache/conftool/dbconfig/20210113-095834-marostegui.json
* 09:49 marostegui: Enable report_host on all codfw sby masters - [[phab:T271106|T271106]]
* 09:42 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 09:05 ayounsi@deploy1001: Finished deploy [homer/deploy@723ebfe]: Netbox 2.9 changes (duration: 03m 11s)
* 09:03 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 09:02 ayounsi@deploy1001: Started deploy [homer/deploy@723ebfe]: Netbox 2.9 changes
* 09:02 moritzm: installing efivar bugfix update
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:47 moritzm: draining ganeti4003 for eventual reboot
* 08:46 ema: cp5008: re-enable puppet to undo JIT tslua experiment [[phab:T265625|T265625]]
* 08:35 moritzm: failover ganeti master in ulsfo to ganeti4002
* 08:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:19 moritzm: draining ganeti4002 for eventual reboot
* 08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:04 ryankemper: [WDQS Deploy] Deploy is complete, and the WDQS service is healthy
* 07:59 moritzm: draining ganeti4001 for eventual reboot
* 07:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 07:29 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 07:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts simultaneously: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 07:28 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@fdd2c2f]: 0.3.59 (duration: 14m 23s)
* 07:15 ryankemper: [WDQS Deploy] All tests passing on canary instance `wdqs1003` following canary deploy. Proceeding to rest of fleet...
* 07:13 ryankemper@deploy1001: Started deploy [wdqs/wdqs@fdd2c2f]: 0.3.59
* 07:13 ryankemper: [WDQS Deploy] All tests passing on canary instance `wdqs1003` prior to start of deploy. Proceeding with canary deploy of version `0.3.59`...
* 07:04 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] Restarting cloudelastic to apply new readahead changes, this will also verify cloudelastic support works in our elasticsearch spicerack code. Only going one node at a time because cloudelastic elasticsearch indices only have 1 replica shard per index.
* 07:03 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13745 and previous config saved to /var/cache/conftool/dbconfig/20210113-065535-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13744 and previous config saved to /var/cache/conftool/dbconfig/20210113-064031-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13743 and previous config saved to /var/cache/conftool/dbconfig/20210113-062528-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13742 and previous config saved to /var/cache/conftool/dbconfig/20210113-061024-root.json


== 2021-01-12 ==
== 2022-08-13 ==
* 22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2225.codfw.wmnet
* 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2224.codfw.wmnet
* 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case [[phab:T266487|T266487]] (duration: 00m 05s)
* 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32379 and previous config saved to /var/cache/conftool/dbconfig/20220813-133713-ladsgroup.json
* 22:46 crusnov@deploy1001: Started deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case [[phab:T266487|T266487]]
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32378 and previous config saved to /var/cache/conftool/dbconfig/20220813-132207-ladsgroup.json
* 22:37 chaomodus: Upgrade of Netbox to 2.9 complete, checking support software. [[phab:T266487|T266487]]
* 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32377 and previous config saved to /var/cache/conftool/dbconfig/20220813-130701-ladsgroup.json
* 22:33 crusnov@deploy1001: Finished deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production [[phab:T266487|T266487]] (duration: 02m 33s)
* 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32376 and previous config saved to /var/cache/conftool/dbconfig/20220813-125156-ladsgroup.json
* 22:30 crusnov@deploy1001: Started deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production [[phab:T266487|T266487]]
* 22:12 chaomodus: Merged Netbox 2.9 related changes in puppet and -extras; testing on -next [[phab:T266487|T266487]]
* 22:07 bblack: reboot authdns1001 - [[phab:T266746|T266746]]#6741647
* 22:04 chaomodus: proceeding with Netbox 2.9 upgrade [[phab:T266487|T266487]]
* 22:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE
* 21:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE
* 21:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE
* 21:50 jforrester@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/AbuseFilter/modules/mode-abusefilter.js: [[phab:T271487|T271487]] Don't pass protocol-relative URLs to the Ace worker (duration: 01m 06s)
* 21:41 ottomata: rolling restart of eventgate-analytics-external pods
* 20:40 tgr_: running 'mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=ukwiki' on terbium
* 19:57 tgr_: backports done
* 19:52 bblack: dns1001,authdns1001 - upgrade gdnsd to 3.5.0
* 19:49 tgr_: synced Config: [[gerrit:654520{{!}}Disable DiscussionTools' upcoming newtopictool (T270119)]]
* 19:49 tgr_: synced Config: [[gerrit:655723{{!}}Migrate HomepageVisit and ServerSideAccountCreation to Event Platform on testwiki (T267333)]]
* 19:48 tgr_: synced Config: [[gerrit:655706{{!}}Migrate SuggestedTagsAction to Event Platform on testwiki (T267351)]]
* 19:48 tgr_: synced Config: [[gerrit:655301{{!}}Alphabetize ORES settings (T256887)]]
* 19:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:655302{{!}}Enable ORES filters on ukwiki (T256887)]] (duration: 01m 05s)
* 19:32 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bunch of no-op/testwiki changes: [[gerrit:654520]], [[gerrit:655301]], [[gerrit:655706]], [[gerrit:655723]] (duration: 01m 05s)
* 19:27 bblack: dns3001,dns4001 - upgrade gdnsd to 3.5.0
* 19:25 ottomata: rolling restart of eventgate-analytics-external pods to clear schema caches - [[phab:T267333|T267333]]
* 19:01 ariel@deploy1001: Synchronized php-1.36.0-wmf.26/includes/api/ApiQueryInfo.php: Backport: (gerrit 655671) Fix undefined index error in ApiQueryInfo ([[phab:T271815|T271815]]) (duration: 01m 06s)
* 18:06 bblack: dns2001,dns5001 - upgrade gdnsd to 3.5.0
* 17:40 bblack: dnsX002 - upgrade gdnsd to 3.5.0
* 17:20 herron: roll restarting eqiad/codfw low-traffic pybals for kibana-next -> kibana7 rename
* 17:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:09 jynus: shutting down db2132, db2078:m1 for m1 codfw replica reprovisioning [[phab:T270877|T270877]]
* 17:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:09 moritzm: rebooting people1002 (people.wikimedia.org)
* 16:56 moritzm: reinstalling bast3005 with correct DHCP settings
* 16:39 herron@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: cluster=kibana7,service=kibana7
* 16:37 ema: cp5008: ats-backend-restart to apply jit.off(true, true) to all lua scripts [[phab:T265625|T265625]]
* 16:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:18 herron@puppetmaster1001: conftool action : set/weight=10; selector: name=logstash2031.codfw.wmnet
* 15:56 ema: cp5008: ats-backend-restart to apply jit.off(true, true) in default.lua [[phab:T265625|T265625]]
* 15:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2055.codfw.wmnet with reason: reboot
* 15:52 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be2055.codfw.wmnet with reason: reboot
* 15:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2031.codfw.wmnet with reason: test unattended reboot
* 15:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be2031.codfw.wmnet with reason: test unattended reboot
* 14:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.26
* 14:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:55 moritzm: draining ganeti3003 for eventual reboot
* 13:53 moritzm: failover ganeti master in esams to ganeti3002
* 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:33 moritzm: draining ganeti3002 for eventual reboot
* 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:08 moritzm: draining ganeti3001 for eventual reboot
* 11:22 moritzm: installing edk2 security updates
* 10:51 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 10:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: [[phab:T271058|T271058]]
* 10:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: [[phab:T271058|T271058]]
* 10:26 moritzm: installing systemd bugfix update from Buster 10.7 point release
* 10:15 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.26 (duration: 67m 18s)
* 10:13 marostegui: Restart mysql on db1138 to pick up new config [[phab:T271427|T271427]] [[phab:T271106|T271106]]
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13736 and previous config saved to /var/cache/conftool/dbconfig/20210112-101211-marostegui.json
* 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.26
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13732 and previous config saved to /var/cache/conftool/dbconfig/20210112-090533-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13731 and previous config saved to /var/cache/conftool/dbconfig/20210112-085030-root.json
* 08:49 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T271755|T271755]] (duration: 00m 57s)
* 08:47 liw: 1.36.0-wmf.26 was branched at {{Gerrit|e6ad9ab7713ee33c30cd7c17762737870dc8fd08}} for [[phab:T267419|T267419]]
* 08:40 marostegui: Sanitize bclwiktionary diqwiktionary niawiki niawiktionary diqwiktionary on db1124  db2094 {{Gerrit|db11154}} [[phab:T270280|T270280]] [[phab:T270276|T270276]] [[phab:T270414|T270414]] [[phab:T270410|T270410]] [[phab:T271261|T271261]]
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13730 and previous config saved to /var/cache/conftool/dbconfig/20210112-083526-root.json
* 08:30 moritzm: installing remaining curl security updates on stretch
* 08:21 marostegui: Deploy schema change on s3 eqiad master - [[phab:T270187|T270187]]
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13729 and previous config saved to /var/cache/conftool/dbconfig/20210112-082023-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P13728 and previous config saved to /var/cache/conftool/dbconfig/20210112-080419-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13727 and previous config saved to /var/cache/conftool/dbconfig/20210112-070051-root.json
* 06:53 XioNoX: push CR655445, only configure vlans relevant to a switch
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13726 and previous config saved to /var/cache/conftool/dbconfig/20210112-064548-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13725 and previous config saved to /var/cache/conftool/dbconfig/20210112-063044-root.json
* 06:30 jhuneidi@deploy1001: Pruned MediaWiki: 1.36.0-wmf.21 (duration: 03m 21s)
* 06:16 marostegui: Stop mysql on db1079 to clone db1155:3317 [[phab:T268742|T268742]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13724 and previous config saved to /var/cache/conftool/dbconfig/20210112-061541-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P13723 and previous config saved to /var/cache/conftool/dbconfig/20210112-060557-marostegui.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P13722 and previous config saved to /var/cache/conftool/dbconfig/20210112-055953-marostegui.json


== 2021-01-11 ==
== 2022-08-12 ==
* 22:16 eileen: process-control config revision is {{Gerrit|f08249ecf9}} eoy jobs disabled
* 23:41 mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg [[phab:T315121|T315121]]
* 22:12 eileen: civicrm revision changed from {{Gerrit|2df572bdcd}} to {{Gerrit|f417a510a5}}, config revision is {{Gerrit|f08249ecf9}}
* 23:38 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.timer [[phab:T315121|T315121]]
* 21:58 Amir1: deleting watchlist enteries of Fawikibot in fawiki (1.1M rows)
* 22:14 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 21:20 mutante: docker images - [deneb:/srv/images/production-images] $ sudo -i build-production-images
* 21:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1071.eqiad.wmnet with OS bullseye
* 21:02 bblack: dns4002 - upgrade gdnsd to 3.5.0 package
* 21:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 20:47 bblack: authdns2001 - upgrade gdnsd to 3.5.0 package
* 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
* 19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate UniversalLanguageSelector to Event Platform - [[phab:T268517|T268517]] (duration: 00m 57s)
* 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
* 19:43 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T270417|T270417]] [[phab:T270413|T270413]] [[phab:T270279|T270279]])
* 21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1071.eqiad.wmnet with OS bullseye
* 19:14 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T270417|T270417]] [[phab:T270413|T270413]] [[phab:T270279|T270279]])
* 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 18:48 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277 (duration: 00m 04s)
* 21:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 18:48 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277
* 21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1053.eqiad.wmnet with OS bullseye
* 18:01 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: [[phab:T181217|T181217]] (duration: 00m 56s)
* 20:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 18:00 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T181217|T181217]] (duration: 00m 57s)
* 20:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
* 17:57 reedy@deploy1001: Synchronized wmf-config/extension-list: [[phab:T181217|T181217]] (duration: 00m 56s)
* 20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
* 17:48 Amir1: manually removing watchlist rows for Dexbot in Wikidata
* 20:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1053.eqiad.wmnet with OS bullseye
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster
* 20:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1048.eqiad.wmnet with OS bullseye
* 17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster
* 19:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster
* 19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
* 17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster
* 19:42 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1048.eqiad.wmnet with OS bullseye
* 17:40 mutante: deploy2002 - scap pull
* 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32375 and previous config saved to /var/cache/conftool/dbconfig/20220812-193822-ladsgroup.json
* 17:39 mutante: deploy1002 - scap pull
* 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:15 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 59s)
* 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:13 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]]
* 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32374 and previous config saved to /var/cache/conftool/dbconfig/20220812-193801-ladsgroup.json
* 17:12 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]] (duration: 02m 05s)
* 19:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1054.eqiad.wmnet with OS bullseye
* 17:10 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]]
* 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32373 and previous config saved to /var/cache/conftool/dbconfig/20220812-192255-ladsgroup.json
* 16:48 Urbanecm: Create new wiki window is completed
* 19:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
* 16:43 andrew@deploy1001: Finished deploy [striker/deploy@3180f72]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 01s)
* 19:09 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
* 16:42 andrew@deploy1001: Started deploy [striker/deploy@3180f72]: Striker deploy for [[phab:T271621|T271621]]
* 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32372 and previous config saved to /var/cache/conftool/dbconfig/20220812-190749-ladsgroup.json
* 16:37 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 18s)
* 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
* 16:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 56s)
* 18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
* 16:33 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 56s)
* 18:54 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1054.eqiad.wmnet with OS bullseye
* 16:32 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating bclwiktionary ([[phab:T270274|T270274]])
* 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32371 and previous config saved to /var/cache/conftool/dbconfig/20220812-185243-ladsgroup.json
* 16:30 urbanecm@deploy1001: Synchronized dblists: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 55s)
* 18:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1066.eqiad.wmnet with OS bullseye
* 16:29 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 55s)
* 18:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
* 16:26 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 54s)
* 18:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
* 16:25 moritzm: installing openldap security updates on stretch (client tools/libs only, all slapd installation on Buster and fixed already)
* 18:08 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1066.eqiad.wmnet with OS bullseye
* 16:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 56s)
* 18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1064.eqiad.wmnet with OS bullseye
* 16:20 andrew@deploy1001: Finished deploy [striker/deploy@ba6c0ae]: Striker deploy for [[phab:T271621|T271621]] (duration: 02m 02s)
* 17:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
* 16:18 andrew@deploy1001: Started deploy [striker/deploy@ba6c0ae]: Striker deploy for [[phab:T271621|T271621]]
* 17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
* 16:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 01m 34s)
* 17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1064.eqiad.wmnet with OS bullseye
* 16:17 moritzm: installing remaining p11-kit security updates on stretch
* 17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts netmon2002.wikimedia.org
* 16:15 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating diqwiktionary ([[phab:T270275|T270275]])
* 17:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon2002.wikimedia.org
* 16:14 urbanecm@deploy1001: Synchronized dblists: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 57s)
* 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
* 16:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 57s)
* 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
* 16:12 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 55s)
* 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
* 16:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 55s)
* 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
* 16:05 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 56s)
* 16:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1067.eqiad.wmnet with OS bullseye
* 16:04 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating niawiktionary ([[phab:T270409|T270409]])
* 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2003-dev.wikimedia.org
* 16:03 urbanecm@deploy1001: Synchronized dblists: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 55s)
* 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:02 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 56s)
* 16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 16:01 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 56s)
* 16:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2003-dev.wikimedia.org
* 15:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 15:57 andrew@deploy1001: Finished deploy [striker/deploy@b2804f2]: Striker deploy for [[phab:T271621|T271621]] (duration: 02m 05s)
* 16:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
* 15:56 urbanecm@deploy1001: Synchronized langlist: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 53s)
* 15:58 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
* 15:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 55s)
* 15:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1067.eqiad.wmnet with OS bullseye
* 15:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE
* 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 15:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 15:31 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 15:55 andrew@deploy1001: Started deploy [striker/deploy@b2804f2]: Striker deploy for [[phab:T271621|T271621]]
* 15:31 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 15:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 56s)
* 15:07 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts netmon1002.wikimedia.org
* 15:54 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating niawiki ([[phab:T270408|T270408]])
* 15:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon1002.wikimedia.org
* 15:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE
* 15:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1061.eqiad.wmnet with OS bullseye
* 15:52 urbanecm@deploy1001: Synchronized dblists: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 57s)
* 14:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
* 15:51 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 57s)
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=varnish-fe
* 15:50 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 56s)
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
* 15:48 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]] (duration: 00m 43s)
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-tls
* 15:47 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]]
* 14:43 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
* 15:47 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 45s)
* 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
* 15:45 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]]
* 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1061.eqiad.wmnet with OS bullseye
* 15:42 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 04s)
* 14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13720 and previous config saved to /var/cache/conftool/dbconfig/20210111-154123-root.json
* 14:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1063.eqiad.wmnet with OS bullseye
* 15:41 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]]
* 14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
* 15:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SpecialMuteSubmit to EventGate - [[phab:T268517|T268517]] (duration: 00m 58s)
* 14:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
* 15:32 effie: upgrading python-thumbor-wikimedia to 2.9 on thumbor1001
* 13:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1063.eqiad.wmnet with OS bullseye
* 15:31 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 13:41 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13719 and previous config saved to /var/cache/conftool/dbconfig/20210111-152619-root.json
* 06:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic10[8-9][0-9].*
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13718 and previous config saved to /var/cache/conftool/dbconfig/20210111-151116-root.json
* 05:54 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic110.*
* 15:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json
* 15:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13717 and previous config saved to /var/cache/conftool/dbconfig/20210111-145612-root.json
* 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P13716 and previous config saved to /var/cache/conftool/dbconfig/20210111-145239-marostegui.json
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 14:32 XioNoX: add Routinator 0.8.2 to APT repo - [[phab:T269738|T269738]]
* 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32368 and previous config saved to /var/cache/conftool/dbconfig/20220812-010233-ladsgroup.json
* 14:22 moritzm: restarting FPM/Apache on app server canaries for curl update
* 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32367 and previous config saved to /var/cache/conftool/dbconfig/20220812-004727-ladsgroup.json
* 14:13 marostegui: Deploy schema change on s3 codfw master - [[phab:T270187|T270187]]
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32366 and previous config saved to /var/cache/conftool/dbconfig/20220812-003221-ladsgroup.json
* 13:52 moritzm: installing curl security updates on stretch
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32365 and previous config saved to /var/cache/conftool/dbconfig/20220812-001715-ladsgroup.json
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13713 and previous config saved to /var/cache/conftool/dbconfig/20210111-134213-root.json
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13712 and previous config saved to /var/cache/conftool/dbconfig/20210111-132709-root.json
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13711 and previous config saved to /var/cache/conftool/dbconfig/20210111-131206-root.json
* 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:655418{{!}} Bumping portals to master (T128546)]] (duration: 01m 03s)
* 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:655418{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 11:10 XioNoX: upgrade Routinator to 0.8.2 on rpki2001 - [[phab:T269738|T269738]]
* 11:10 jbond42: push change to ratelimit vscode-phabricator - [[phab:T271528|T271528]]
* 10:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ab9e80dad5c44ff72a6fa7568a5ba59798df3d4e}}: Enable anniversary logo for cs.wikipedia ([[phab:T271662|T271662]]; 2/2) (duration: 00m 56s)
* 10:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|ab9e80dad5c44ff72a6fa7568a5ba59798df3d4e}}: Enable anniversary logo for cs.wikipedia ([[phab:T271662|T271662]]; 1/2) (duration: 01m 00s)
* 10:06 ema: cp3050: restart ats-be to lower lua states from 256 to 64 [[phab:T265625|T265625]]
* 09:31 marostegui: Sanitize db1155:3314 - [[phab:T268742|T268742]]
* 09:31 marostegui: Deploy schema change on s1 codfw master - [[phab:T270187|T270187]]
* 09:02 elukey: force puppet on logstash1007 after ES OOM
* 08:55 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 08:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE
* 08:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE
* 08:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE
* 08:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE
* 07:49 dcausse: depooling & restarting blazegraph on wdqs2007 ([[phab:T242453|T242453]])
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13709 and previous config saved to /var/cache/conftool/dbconfig/20210111-074853-root.json
* 07:43 dcausse: repool wdqs1007 (wrong machine) ([[phab:T242453|T242453]])
* 07:41 dcausse: depooling & restarting blazegraph on wdqs1007 ([[phab:T242453|T242453]])
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13708 and previous config saved to /var/cache/conftool/dbconfig/20210111-073349-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13707 and previous config saved to /var/cache/conftool/dbconfig/20210111-071846-root.json
* 07:12 marostegui: Deploy schema change on s8 codfw master - [[phab:T270187|T270187]]
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13706 and previous config saved to /var/cache/conftool/dbconfig/20210111-070342-root.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P13704 and previous config saved to /var/cache/conftool/dbconfig/20210111-065640-marostegui.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13703 and previous config saved to /var/cache/conftool/dbconfig/20210111-065550-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13702 and previous config saved to /var/cache/conftool/dbconfig/20210111-064046-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P13701 and previous config saved to /var/cache/conftool/dbconfig/20210111-063226-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094', diff saved to https://phabricator.wikimedia.org/P13700 and previous config saved to /var/cache/conftool/dbconfig/20210111-063155-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P13699 and previous config saved to /var/cache/conftool/dbconfig/20210111-063124-marostegui.json
* 06:04 marostegui: Depool db1121 to clone db1155:3314
* 06:04 marostegui: Deploy schema change on s7 codfw master - [[phab:T270187|T270187]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13698 and previous config saved to /var/cache/conftool/dbconfig/20210111-060342-marostegui.json


== 2021-01-09 ==
== 2022-08-11 ==
* 00:11 mutante: puppetmaster2003 - restarted apache after spweing 500s
* 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:04 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: revert [[gerrit:806944{{!}}Define default value for "wmgSiteLogoVariants" (T305692 T308620)]] (duration: 03m 15s)
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:47 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806944{{!}}Define default value for "wmgSiteLogoVariants" (T305692 T308620)]] (duration: 03m 07s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:29 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/modules/ve-mw/preinit/ve.init.mw.DesktopArticleTarget.init.js: Backport: [[gerrit:822396{{!}}Do not show incompatible skin warning when page is not editable (T314952)]] (duration: 03m 16s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 mutante: merging change on prod phabricator host to allow scap deployment, part 1
* 19:42 damilare: payments-wiki upgraded from {{Gerrit|cf5e1848}} to {{Gerrit|0894d75a}}
* 19:41 mutante: disabling puppet on C:profile::phabricator::main
* 19:20 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:58 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:822428{{!}}Fix labtestwiki database name servers (T310795)]] (duration: 03m 39s)
* 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:52 sukhe: testing ATS 9.1.3-1wm1 on cp3064: [[phab:T309651|T309651]]
* 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
* 17:46 sukhe: testing ATS 9.1.3-1wm1 on cp3064: [[phab:T3096515|T3096515]]
* 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
* 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:38 sukhe: testing ATS 9.1.3-1wm1 on cp1090: [[phab:T309651|T309651]]
* 17:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host netmon2002
* 17:34 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host netmon2002
* 17:33 sukhe: testing ATS 9.1.3-1wm1 on cp3065: [[phab:T309651|T309651]]
* 17:28 sukhe: testing ATS 9.1.3-1wm1 on cp1089: [[phab:T309651|T309651]]
* 17:19 bking@cumin1001: conftool action : set/weight=10:pooled=no; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
* 17:18 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
* 17:15 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
* 16:35 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 16:30 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 16:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 16:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 16:26 inflatador: bking@elastic1054 attempting to ban elastic1100-1102 from cluster due to firewall issues
* 16:13 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
* 16:12 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic1100
* 15:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P32364 and previous config saved to /var/cache/conftool/dbconfig/20220811-145823-ladsgroup.json
* 14:55 inflatador: bking@cumin1001 running puppet agent across eqiad elastic hosts
* 14:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P32362 and previous config saved to /var/cache/conftool/dbconfig/20220811-144318-ladsgroup.json
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P32361 and previous config saved to /var/cache/conftool/dbconfig/20220811-142813-ladsgroup.json
* 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1003.wikimedia.org
* 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:24 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:19 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1003.wikimedia.org
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1004.wikimedia.org
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822375{{!}}Stop writing to the old templatelinks fields in s2 (T312865)]] (duration: 03m 25s)
* 14:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:15 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P32360 and previous config saved to /var/cache/conftool/dbconfig/20220811-141309-ladsgroup.json
* 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:11 awight: EU backport window complete
* 14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:10 awight@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: [[gerrit:822149{{!}}CommentFormatter: Set 'data-mw-comment' even when reply tool disabled (T314707)]] (duration: 03m 31s)
* 14:09 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1004.wikimedia.org
* 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:52 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 13:50 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:820666{{!}}Revert "Revert "testwiki: Add mediawiki.web_ui.interactions stream""]] (duration: 03m 10s)
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1060.eqiad.wmnet with OS bullseye
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822130{{!}}trwikiquote: Install WikiLove extension (T314895)]] (duration: 03m 30s)
* 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host logstash2003.codfw.wmnet
* 13:25 awight@deploy1002: Synchronized static/images: Config: [[gerrit:821330{{!}}Revert "trwiki: Change old and new vector logos for 500k articles"]] (part 3) (duration: 03m 09s)
* 13:21 awight@deploy1002: Synchronized logos/: Config: [[gerrit:821330{{!}}Revert "trwiki: Change old and new vector logos for 500k articles"]] (part 2) (duration: 03m 09s)
* 13:19 topranks: merging CR821781 to expose additional network info in puppet facts
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:18 awight@deploy1002: Synchronized wmf-config/: Config: [[gerrit:821330{{!}}Revert "trwiki: Change old and new vector logos for 500k articles"]] (part 1) (duration: 03m 13s)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822073{{!}}Enable editor line numbering on all namespaces, for twwiki (T302852)]] (duration: 03m 42s)
* 12:56 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1060.eqiad.wmnet with OS bullseye
* 12:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 12:49 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:46 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[367].codfw.wmnet
* 12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 12:16 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2003.codfw.wmnet
* 12:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:10 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:32 godog: arm keyholder on netmon2001
* 09:09 jbond: update gnutls28 on bullseye systems
* 09:00 jbond: update unzip
* 08:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:12 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 08:06 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 07:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 07:57 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 07:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
* 07:51 vgutierrez: rolling restart of pybal in eqsin and ulsfo
* 07:24 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 07:24 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline
* 07:23 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 07:19 _joe_: pooling all services in codfw
* 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32357 and previous config saved to /var/cache/conftool/dbconfig/20220811-070312-ladsgroup.json
* 07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 07:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32356 and previous config saved to /var/cache/conftool/dbconfig/20220811-070252-ladsgroup.json
* 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32355 and previous config saved to /var/cache/conftool/dbconfig/20220811-064746-ladsgroup.json
* 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32354 and previous config saved to /var/cache/conftool/dbconfig/20220811-063240-ladsgroup.json
* 06:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32353 and previous config saved to /var/cache/conftool/dbconfig/20220811-061734-ladsgroup.json
* 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
* 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
* 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1162 ([[phab:T314368|T314368]] [[phab:T298555|T298555]] [[phab:T312863|T312863]] [[phab:T310011|T310011]] [[phab:T309311|T309311]] [[phab:T60674|T60674]] [[phab:T298560|T298560]] [[phab:T303603|T303603]] [[phab:T310485|T310485]])', diff saved to https://phabricator.wikimedia.org/P32352 and previous config saved to /var/cache/conftool/dbconfig/20220811-060625-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 primary and set section read-write [[phab:T314368|T314368]]', diff saved to https://phabricator.wikimedia.org/P32351 and previous config saved to /var/cache/conftool/dbconfig/20220811-060113-ladsgroup.json
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - [[phab:T314368|T314368]]', diff saved to https://phabricator.wikimedia.org/P32350 and previous config saved to /var/cache/conftool/dbconfig/20220811-060042-ladsgroup.json
* 06:00 Amir1: Starting s2 eqiad failover from db1162 to db1122 - [[phab:T314368|T314368]]
* 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 [[phab:T314368|T314368]]', diff saved to https://phabricator.wikimedia.org/P32349 and previous config saved to /var/cache/conftool/dbconfig/20220811-051913-ladsgroup.json
* 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T314368|T314368]]
* 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T314368|T314368]]
* m: chown -R librenms /srv/librenms/rrd/ on netmon1003 [[phab:T314972|T314972]]
* 03:51 cwhite: chown librenms /srv/librenms/rrd/* on netmon1003 [[phab:T314972|T314972]]
* 02:55 ejegg: civicrm upgraded from {{Gerrit|1f91ac2d}} to {{Gerrit|92467234}}
* 02:46 ejegg: updated process-control yaml files with @wmff alias
* 02:08 ejegg: civicrm rolled back from {{Gerrit|92467234}} to {{Gerrit|1f91ac2d}}
* 02:05 ejegg: civicrm upgraded from {{Gerrit|1f91ac2d}} to {{Gerrit|92467234}}
* 01:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:38 tstarling@deploy1002: Synchronized wmf-config/logging.php: (no justification provided) (duration: 03m 25s)
* 01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=varnish-fe
* 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
* 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-tls
* 00:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow
* 00:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow


== 2021-01-08 ==
== 2022-08-10 ==
* 19:48 andrew@deploy1001: Finished deploy [striker/deploy@e4db843]: Striker deploy for [[phab:T269004|T269004]] (duration: 02m 11s)
* 21:25 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1016.eqiad.wmnet
* 19:45 andrew@deploy1001: Started deploy [striker/deploy@e4db843]: Striker deploy for [[phab:T269004|T269004]]
* 21:23 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
* 19:28 andrew@deploy1001: Finished deploy [horizon/deploy@7466703]: Horizon with a bunch of Buster patches (duration: 02m 35s)
* 21:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 19:26 andrew@deploy1001: Started deploy [horizon/deploy@7466703]: Horizon with a bunch of Buster patches
* 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 18:02 joal@deploy1001: Finished deploy [analytics/refinery@db9da3c] (thin): Hotfix analytics deployment - THIN [analytics/refinery@db9da3c] (duration: 00m 07s)
* 21:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 18:02 joal@deploy1001: Started deploy [analytics/refinery@db9da3c] (thin): Hotfix analytics deployment - THIN [analytics/refinery@db9da3c]
* 21:09 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 18:01 joal@deploy1001: Finished deploy [analytics/refinery@db9da3c]: Hotfix analytics deployment [analytics/refinery@db9da3c] (duration: 11m 27s)
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:50 joal@deploy1001: Started deploy [analytics/refinery@db9da3c]: Hotfix analytics deployment [analytics/refinery@db9da3c]
* 21:00 cjming: end of UTC late backport window
* 17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
* 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:15 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
* 20:59 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820533{{!}}Remove unused $wgEnableMWSuggest]] (duration: 03m 04s)
* 17:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on maps1009.eqiad.wmnet with reason: Downtiming while not pooled
* 20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820568{{!}}Enable new topic tool on dewiki (T313699)]] (duration: 03m 01s)
* 17:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on maps1009.eqiad.wmnet with reason: Downtiming while not pooled
* 20:34 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822093{{!}}testwiki: set $wgCdnMatchParameterOrder to false (T314868)]] (duration: 03m 20s)
* 17:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1001.wikimedia.org with reason: REIMAGE
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1001.wikimedia.org with reason: REIMAGE
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:50 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:43 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:42 andrewbogott: shutting down labweb1001 so I can really believe that all traffic is being served by 1002
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:35 andrew@deploy1001: Finished deploy [horizon/deploy@7466703]: selective disable of problematic compression block (duration: 01m 42s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:33 andrew@deploy1001: Started deploy [horizon/deploy@7466703]: selective disable of problematic compression block
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:32 andrew@deploy1001: Finished deploy [horizon/deploy@7466703]: selective disable of problematic compression block (duration: 01m 52s)
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:30 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:30 andrew@deploy1001: Started deploy [horizon/deploy@7466703]: selective disable of problematic compression block
* 20:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:24 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:58 andrew@deploy1001: Finished deploy [horizon/deploy@ecaad83]: minor django package upgrades -> labweb1002 (duration: 04m 25s)
* 20:08 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820646{{!}}Start writing to cuc_actor everywhere except s4 and s8 (T233004)]] (duration: 03m 15s)
* 15:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:54 andrew@deploy1001: Started deploy [horizon/deploy@ecaad83]: minor django package upgrades -> labweb1002
* 19:51 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2053-2054].codfw.wmnet
* 15:51 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 19:51 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2053-2054].codfw.wmnet
* 15:43 andrew@deploy1001: Finished deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev (duration: 00m 29s)
* 19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2019-2020].codfw.wmnet
* 15:43 andrew@deploy1001: Started deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev
* 19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2019-2020].codfw.wmnet
* 15:39 reedy@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/AbuseFilter/: [[phab:T271430|T271430]] [[phab:T271431|T271431]] [[phab:T271432|T271432]] [[phab:T271433|T271433]] (duration: 01m 00s)
* 19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2016-2018].codfw.wmnet
* 15:39 andrew@deploy1001: Finished deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev (duration: 01m 39s)
* 19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2016-2018].codfw.wmnet
* 15:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 19:34 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2036.codfw.wmnet
* 15:38 andrew@deploy1001: Started deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev
* 19:34 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2036.codfw.wmnet
* 15:24 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 01m 06s)
* 19:28 sukhe: testing ATS 9.1.3-1wm1 on cp4026: [[phab:T309651|T309651]]
* 15:23 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
* 19:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1087.eqiad.wmnet with OS bullseye
* 15:18 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades + compression (duration: 01m 47s)
* 19:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1086.eqiad.wmnet with OS bullseye
* 15:17 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades + compression
* 18:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1087.eqiad.wmnet with reason: host reimage
* 15:14 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 01m 00s)
* 18:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1086.eqiad.wmnet with reason: host reimage
* 15:13 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
* 18:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1087.eqiad.wmnet with reason: host reimage
* 15:12 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 00m 05s)
* 18:49 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1086.eqiad.wmnet with reason: host reimage
* 15:12 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
* 18:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:11 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 01m 30s)
* 18:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1087.eqiad.wmnet with OS bullseye
* 15:09 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
* 18:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1086.eqiad.wmnet with OS bullseye
* 15:08 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades (duration: 01m 49s)
* 18:22 urandom: truncating Cassandra hints (eqiad datacenter)  -- [[phab:T314941|T314941]]
* 15:06 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades
* 18:13 urandom: truncating codfw Cassandra hints (eqiad datacenter)  -- [[phab:T314941|T314941]]
* 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13697 and previous config saved to /var/cache/conftool/dbconfig/20210108-150617-root.json
* 18:07 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2005.codfw.wmnet
* 15:03 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades (duration: 01m 35s)
* 18:07 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kafka-main2005.codfw.wmnet
* 15:02 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool D8 DBs after PDU maint ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32346 and previous config saved to /var/cache/conftool/dbconfig/20220810-180529-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13696 and previous config saved to /var/cache/conftool/dbconfig/20210108-145113-root.json
* 17:42 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e]: Add missing changes to the deletion script - [[phab:T270433|T270433]] -  [analytics/refinery@6e47e0e] (duration: 05m 28s)
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13695 and previous config saved to /var/cache/conftool/dbconfig/20210108-143610-root.json
* 17:39 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labweb1002.wikimedia.org
* 13:42 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
* 17:39 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:41 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 17:36 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e]: Add missing changes to the deletion script - [[phab:T270433|T270433]] -  [analytics/refinery@6e47e0e]
* 13:39 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
* 17:35 fnegri@cumin1001: START - Cookbook sre.dns.netbox
* 13:37 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
* 17:34 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e] (hadoop-test): Add missing changes to the deletion script - [[phab:T270433|T270433]] - TEST [analytics/refinery@6e47e0e] (duration: 04m 19s)
* 13:37 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 17:30 fnegri@cumin1001: START - Cookbook sre.hosts.decommission for hosts labweb1002.wikimedia.org
* 13:37 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
* 17:30 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e] (hadoop-test): Add missing changes to the deletion script - [[phab:T270433|T270433]] - TEST [analytics/refinery@6e47e0e]
* 12:52 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
* 17:09 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 12:49 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
* 17:08 otto@deploy1002: Started deploy [analytics/refinery@d4dd7e4] (hadoop-test): Add safety limits to refinery-drop-older-than - [[phab:T270433|T270433]] - TEST [analytics/refinery@d4dd7e4]
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13694 and previous config saved to /var/cache/conftool/dbconfig/20210108-120415-root.json
* 17:06 sukhe: testing ATS 9.1.3-1wm1 on cp4032: [[phab:T309651|T309651]]
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13693 and previous config saved to /var/cache/conftool/dbconfig/20210108-114912-root.json
* 17:06 urandom: flushing RESTBase Cassandra tables -row B- to (temporarily) free instance-data space -- [[phab:T314941|T314941]]
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13692 and previous config saved to /var/cache/conftool/dbconfig/20210108-113408-root.json
* 17:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on krb2001.codfw.wmnet with reason: btullis codfw maintenance
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13691 and previous config saved to /var/cache/conftool/dbconfig/20210108-111905-root.json
* 17:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on krb2001.codfw.wmnet with reason: btullis codfw maintenance
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P13690 and previous config saved to /var/cache/conftool/dbconfig/20210108-111733-marostegui.json
* 17:04 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gerrit2001.wikimedia.org
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13689 and previous config saved to /var/cache/conftool/dbconfig/20210108-111345-root.json
* 17:02 sukhe: testing ATS 9.1.3-1wm1 on cp6008: [[phab:T309651|T309651]]
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13688 and previous config saved to /var/cache/conftool/dbconfig/20210108-105842-root.json
* 16:56 sukhe: testing ATS 9.1.3-1wm1 on cp6016: [[phab:T309651|T309651]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13676 and previous config saved to /var/cache/conftool/dbconfig/20210108-104338-root.json
* 16:55 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labweb1001.wikimedia.org
* 10:38 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 10s)
* 16:55 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13675 and previous config saved to /var/cache/conftool/dbconfig/20210108-102835-root.json
* 16:32 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gerrit2001.wikimedia.org
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13674 and previous config saved to /var/cache/conftool/dbconfig/20210108-102606-marostegui.json
* 16:32 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:01 elukey: restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well
* 16:32 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2013-2014].codfw.wmnet
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13673 and previous config saved to /var/cache/conftool/dbconfig/20210108-100040-root.json
* 16:31 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2013-2014].codfw.wmnet
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13672 and previous config saved to /var/cache/conftool/dbconfig/20210108-094535-root.json
* 16:31 jelto: kubectl uncordon kubernetes2014.codfw.wmnet
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13671 and previous config saved to /var/cache/conftool/dbconfig/20210108-093032-root.json
* 16:31 fnegri@cumin1001: START - Cookbook sre.dns.netbox
* 09:30 marostegui: Restart mysql on db1115 (tendril/dbtree)
* 16:30 jelto: kubectl uncordon kubernetes2013.codfw.wmnet
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13670 and previous config saved to /var/cache/conftool/dbconfig/20210108-091528-root.json
* 16:29 urandom: restarting Cassandra (RESTBase) -row A- to apply r822110 -- [[phab:T314941|T314941]]
* 09:08 moritzm: installing libxstream-java security updates on Buster
* 16:27 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 09:01 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 16:25 fnegri@cumin1001: START - Cookbook sre.hosts.decommission for hosts labweb1001.wikimedia.org
* 08:12 marostegui: Deploy schema change on s4 codfw master - [[phab:T270187|T270187]]
* 16:23 mutante: shutting down gerrit2001
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P13669 and previous config saved to /var/cache/conftool/dbconfig/20210108-075714-marostegui.json
* 16:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2034-2035].codfw.wmnet
* 07:23 marostegui: Deploy schema change on s5 codfw master - [[phab:T270187|T270187]]
* 16:23 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2034-2035].codfw.wmnet
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 to clone db1155:3316 [[phab:T268742|T268742]] ', diff saved to https://phabricator.wikimedia.org/P13666 and previous config saved to /var/cache/conftool/dbconfig/20210108-063301-marostegui.json
* 16:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2016-2018].codfw.wmnet
* 06:18 marostegui: Deploy schema change on s2 codfw master - [[phab:T270187|T270187]]
* 16:22 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2016-2018].codfw.wmnet
* 04:59 mutante: mw1266 - restart-php7.2-fpm
* 16:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 03:04 ryankemper: [wdqs deploy] Deploy complete, service is healthy. This is done.
* 16:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=sessionstore2003.codfw.wmnet
* 02:35 ryankemper: [wdqs deploy] Restarting `wdqs-categories` across load-balanced instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 16:13 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm1_amd64.changes: [[phab:T309651|T309651]]
* 02:35 ryankemper: [wdqs deploy] Restarted `wdqs-categories` across test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 16:13 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gerrit2001.wikimedia.org
* 02:34 ryankemper: [wdqs deploy] Restarted `wdqs-updater` across all instances: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 16:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2039,2050,2056,2059].codfw.wmnet,thanos-be2004.codfw.wmnet with reason: PDU work
* 02:27 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b15fc5c]: 0.3.58 (duration: 18m 04s)
* 16:10 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2039,2050,2056,2059].codfw.wmnet,thanos-be2004.codfw.wmnet with reason: PDU work
* 02:15 ryankemper: [wdqs deploy] Nevermind - the UI failure I mentioned above is transient. Restarting my ssh tunnel seemed to make the problem go away. Proceeding with deploy
* 16:09 urandom: flushing tables in row D (RESTBase Cassandra cluster)  -- [[phab:T314941|T314941]]
* 02:12 ryankemper: [wdqs deploy] While queries run fine, it looks like there might be a UI glitch in this version. Digging in to see if it's transient, but I'll likely be aborting this deploy
* 15:54 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gitlab-runner2004.codfw.wmnet
* 02:09 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b15fc5c]: 0.3.58
* 15:54 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for gitlab-runner2004.codfw.wmnet
* 02:09 ryankemper: [wdqs deploy] Tests passing on canary before beginning wdqs deploy, proceeding
* 15:53 sukhe: poweroff cp2041, 42 for PDU ugprade: rack D7
* 01:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 15:51 urandom: flushing tables in row B (RESTBase Cassandra cluster) -- [[phab:T314941|T314941]]
* 01:28 mutante: mw1276, mw1277 - first API appervers on buster, now serving traffic, free to depool if any issues
* 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: PDU maintenance
* 01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet
* 15:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: PDU maintenance
* 01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet
* 15:46 urandom: flushing tables in row A (RESTBase Cassandra cluster)  -- [[phab:T314941|T314941]]
* 01:24 mutante: mw1266 - another buster appserver now serving traffic
* 15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2012.codfw.wmnet with reason: btullis codfw maintenance
* 01:24 mutante: mw1265 - raised weight to 25 like regular appservers (buster)
* 15:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2041-2042].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
* 01:23 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1265.eqiad.wmnet
* 15:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2012.codfw.wmnet with reason: btullis codfw maintenance
* 01:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1266.eqiad.wmnet
* 15:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2041-2042].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
* 01:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1277.eqiad.wmnet
* 15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2011.codfw.wmnet with reason: btullis codfw maintenance
* 01:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1276.eqiad.wmnet
* 15:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2011.codfw.wmnet with reason: btullis codfw maintenance
* 01:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 15:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2010.codfw.wmnet with reason: btullis codfw maintenance
* 01:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1266.eqiad.wmnet
* 15:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2010.codfw.wmnet with reason: btullis codfw maintenance
* 00:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE
* 15:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2009.codfw.wmnet with reason: btullis codfw maintenance
* 00:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE
* 15:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2009.codfw.wmnet with reason: btullis codfw maintenance
* 00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE
* 15:37 urandom: (ephemerally) increasing hinted hand-off delivery rate limit to 16KB, RESTBase eqiad nodes  -- [[phab:T314941|T314941]]
* 00:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE
* 15:34 jbond: remove puppetmaster[12]002 from production
* 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE
* 15:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2004.codfw.wmnet
* 00:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE
* 15:30 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for kafka-main2004.codfw.wmnet
* 00:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE
* 15:20 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2051-2052].codfw.wmnet
* 00:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE
* 15:20 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2051-2052].codfw.wmnet
* 00:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undeploy graphoid on enwiki [[phab:T271495|T271495]] (duration: 00m 57s)
* 15:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc-gp2003.codfw.wmnet
* 15:17 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc-gp2003.codfw.wmnet
* 15:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2033.codfw.wmnet
* 15:16 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2033.codfw.wmnet
* 15:14 _joe_: power off krb2002
* 15:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: PDU maintenance
* 15:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: PDU maintenance
* 15:13 _joe_: shutting down rdb2010,puppetmaster2002 for d5 maintenance
* 15:02 jelto: power off mc2035
* 15:01 jelto: power off mc2034
* 15:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc2035.codfw.wmnet with reason: PDU swap
* 15:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc2035.codfw.wmnet with reason: PDU swap
* 15:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc2034.codfw.wmnet with reason: PDU swap
* 15:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc2034.codfw.wmnet with reason: PDU swap
* 14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: PDU Maint ([[phab:T310146|T310146]])
* 14:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: PDU Maint ([[phab:T310146|T310146]])
* 14:38 urandom: disabling reserved space on eqiad nodes (RESTBase), /dev/md2 (aka /srv/cassandra/instance-data) -- [[phab:T314941|T314941]]
* 14:28 jelto: power off kafka-main2004 gracefully
* 14:28 hnowlan: shutting down sessionstore2003
* 14:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=sessionstore2003.codfw.wmnet
* 14:27 sukhe: power off cp2039, cp2040 for PDU upgrade: rack D
* 14:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: PDU maintenance
* 14:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: PDU maintenance
* 14:25 jelto: power off mc-gp2003
* 14:25 jelto: power off mc2033
* 14:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on kafka-main2004.codfw.wmnet with reason: PDU swap
* 14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on kafka-main2004.codfw.wmnet with reason: PDU swap
* 14:23 sukhe: depool codfw for PDU upgrade: rack D
* 14:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on mc-gp2003.codfw.wmnet with reason: PDU swap
* 14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on mc-gp2003.codfw.wmnet with reason: PDU swap
* 14:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on mc2033.codfw.wmnet with reason: PDU swap
* 14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on mc2033.codfw.wmnet with reason: PDU swap
* 14:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp20[39{{!}}40]\.codfw\.wmnet,service=ats-tls
* 14:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2039-2040].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
* 14:13 urandom: flushing Cassandra tables, restbase1030
* 14:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2039-2040].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
* 14:13 urandom: flushing Cassandra tables, restbase1019
* 14:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2002.wikimedia.org with reason: shutdown for PDU upgrade: rack D4
* 14:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on dns2002.wikimedia.org with reason: shutdown for PDU upgrade: rack D4
* 14:11 urandom: flushing Cassandra tables, restbase1017 1018 1021 1024 1025 1026 1028 1029
* 14:05 urandom: flushing tables, restbase1016
* 13:52 hnowlan: powered up restbase2018
* 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on netmon1003.wikimedia.org with reason: pdu
* 13:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netmon1003.wikimedia.org with reason: pdu
* 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on logstash2003.codfw.wmnet with reason: pdu
* 13:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on logstash2003.codfw.wmnet with reason: pdu
* 13:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on logstash2029.codfw.wmnet with reason: pdu
* 13:31 root@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on logstash2029.codfw.wmnet with reason: pdu
* 13:30 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: [[phab:T310146|T310146]]
* 13:30 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: [[phab:T310146|T310146]]
* 13:17 elukey: powering on restbase2027
* 13:12 elukey: powering on restbase2026
* 13:12 _joe_: powering on restbase2023
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32343 and previous config saved to /var/cache/conftool/dbconfig/20220810-130108-ladsgroup.json
* 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 12:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[2072,2084-2085].codfw.wmnet with reason: [[phab:T310146|T310146]]
* 12:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[2072,2084-2085].codfw.wmnet with reason: [[phab:T310146|T310146]]
* 12:27 jbond: remove confd from serveres that shouldn;t have it
* 12:05 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/Echo/maintenance/removeOrphanedEvents.php: Backport: [[gerrit:821735{{!}}Run clean ups with removeOrphanedEvents in major batches (T310428)]] (duration: 03m 32s)
* 11:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 10:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 10:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 10:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2181-2182].codfw.wmnet with reason: D6 PDU maint ([[phab:T310146|T310146]])
* 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2181-2182].codfw.wmnet with reason: D6 PDU maint ([[phab:T310146|T310146]])
* 10:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on maps2010.codfw.wmnet with reason: PDU maintenance
* 10:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on maps2010.codfw.wmnet with reason: PDU maintenance
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 10:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 10:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2018.codfw.wmnet with reason: PDU maintenance
* 10:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2018.codfw.wmnet
* 10:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2018.codfw.wmnet with reason: PDU maintenance
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ml-serve2008.codfw.wmnet with reason: PDU maintenance
* 10:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ml-serve2008.codfw.wmnet with reason: PDU maintenance
* 10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ores2009.codfw.wmnet with reason: PDU maintenance
* 10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ores2009.codfw.wmnet with reason: PDU maintenance
* 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ores2008.codfw.wmnet with reason: PDU maintenance
* 10:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ores2008.codfw.wmnet with reason: PDU maintenance
* 10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase202[367].codfw.wmnet
* 10:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2023,2026-2027].codfw.wmnet with reason: PDU maintenance
* 10:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2023,2026-2027].codfw.wmnet with reason: PDU maintenance
* 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: D8 PDU Maint ([[phab:T310146|T310146]])
* 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: D8 PDU Maint ([[phab:T310146|T310146]])
* 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D8 DBs for PDU maint ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32341 and previous config saved to /var/cache/conftool/dbconfig/20220810-095059-ladsgroup.json
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2101,2130,2140].codfw.wmnet,dbproxy2004.codfw.wmnet with reason: D6 PDU maint ([[phab:T310146|T310146]])
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2101,2130,2140].codfw.wmnet,dbproxy2004.codfw.wmnet with reason: D6 PDU maint ([[phab:T310146|T310146]])
* 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D6 dbs ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32340 and previous config saved to /var/cache/conftool/dbconfig/20220810-093433-ladsgroup.json
* 09:31 jelto: depool services in codfw for upcoming PDU replacement - [[phab:T309956|T309956]]
* 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 09:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 09:28 jynus: shutdown backup2007 before pdu upgrade [[phab:T310146|T310146]]
* 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:15 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/maintenance/namespaceDupes.php: Backport: [[gerrit:821734{{!}}maintenance: Add support for links migration to namespaceDupes.php (T314711)]] (duration: 03m 18s)
* 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2093,2120,2129,2172].codfw.wmnet with reason: D5 PDU maint ([[phab:T310146|T310146]])
* 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2093,2120,2129,2172].codfw.wmnet with reason: D5 PDU maint ([[phab:T310146|T310146]])
* 09:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D5 dbs ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32339 and previous config saved to /var/cache/conftool/dbconfig/20220810-091038-ladsgroup.json
* 08:49 jynus: shutdown dbprov2003 before pdu upgrade [[phab:T310146|T310146]]
* 08:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:48 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2028.codfw.wmnet
* 08:48 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2028.codfw.wmnet
* 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P32337 and previous config saved to /var/cache/conftool/dbconfig/20220810-084222-ladsgroup.json
* 08:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:35 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822037{{!}}Stop writing to the old templatelinks fields in s5 (T312865)]] (duration: 03m 29s)
* 08:32 jelto: power off gitlab-runner2004
* 08:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:30:00 on gitlab-runner2004.codfw.wmnet with reason: PDU swap
* 08:31 root@cumin1001: START - Cookbook sre.hosts.downtime for 8:30:00 on gitlab-runner2004.codfw.wmnet with reason: PDU swap
* 08:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2028.codfw.wmnet with reason: Trying to fix full /
* 08:28 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2028.codfw.wmnet with reason: Trying to fix full /
* 08:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:27 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P32336 and previous config saved to /var/cache/conftool/dbconfig/20220810-082718-ladsgroup.json
* 08:25 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:25 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:24 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:24 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
* 08:23 kart_: Run: mwscript namespaceDupes.php arywiki --fix ([[phab:T291737|T291737]])
* 08:13 jynus: restart replication on db1117:m1 [[phab:T309074|T309074]]
* 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P32335 and previous config saved to /var/cache/conftool/dbconfig/20220810-081213-ladsgroup.json
* 08:09 kartik@deploy1002: Finished scap: Backport: [[gerrit:821732{{!}}arywiki: change namespace translations, add unchanged namespaces and add old translations as aliases (T291737)]] (duration: 10m 37s)
* 07:59 kartik@deploy1002: Started scap: Backport: [[gerrit:821732{{!}}arywiki: change namespace translations, add unchanged namespaces and add old translations as aliases (T291737)]]
* 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P32334 and previous config saved to /var/cache/conftool/dbconfig/20220810-075708-ladsgroup.json
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P32333 and previous config saved to /var/cache/conftool/dbconfig/20220810-075636-ladsgroup.json
* 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 07:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 07:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:51 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:46 dcaro@cumin1001: START - Cookbook sre.dns.netbox
* 07:39 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:34 dcaro@cumin1001: START - Cookbook sre.dns.netbox
* 07:33 godog: depool thanos-fe2001 for debugging
* 07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:821170{{!}}Enable SectionTranslation on testwiki with new MT support from Google (T313296)]] (duration: 05m 44s)
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:24 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on kubernetes[2013-2014].codfw.wmnet with reason: PDU maintenance
* 05:24 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on kubernetes[2013-2014].codfw.wmnet with reason: PDU maintenance
* 05:19 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on parse[2016-2020].codfw.wmnet with reason: PDU maintenance
* 05:19 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on parse[2016-2020].codfw.wmnet with reason: PDU maintenance
* 05:12 _joe_: starting to shut down servers in codfw for the PDU maintenance
* 05:09 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 10 hosts with reason: PDU maintenance
* 05:09 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 10 hosts with reason: PDU maintenance
* 05:09 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on mc-gp2003.codfw.wmnet with reason: PDU maintenance
* 05:09 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on mc-gp2003.codfw.wmnet with reason: PDU maintenance
* 05:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on mc2033.codfw.wmnet with reason: PDU maintenance
* 05:06 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on mc2033.codfw.wmnet with reason: PDU maintenance
* 05:05 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: PDU maintenance
* 05:05 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: PDU maintenance
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-01-07 ==
== 2022-08-09 ==
* 23:55 mutante: reimaging mw1267,mw1276,mw1277
* 23:17 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1011.eqiad.wmnet
* 23:28 mutante: reimaging mw1266
* 23:07 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:14 andrew@deploy1001: Finished deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 00s)
* 23:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 23:12 andrew@deploy1001: Started deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host
* 22:51 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:54 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 04s)
* 22:51 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:54 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 22:49 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:52 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 07m 44s)
* 22:49 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:44 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 22:46 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1015.eqiad.wmnet
* 22:41 andrew@deploy1001: Finished deploy [striker/deploy@e4db843]: striker -> labweb1002 (duration: 00m 04s)
* 22:31 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:41 andrew@deploy1001: Started deploy [striker/deploy@e4db843]: striker -> labweb1002
* 22:31 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:39 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 06s)
* 22:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:39 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 22:02 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:02 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:24 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 22:19 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=labweb1002.wikimedia.org
* 21:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 22:12 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.25  refs [[phab:T267418|T267418]]
* 21:53 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 21:43 jforrester@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/CodeMirror/resources/ext.CodeMirror.js: [[phab:T271457|T271457]] Guard against WikiEditor being removed by the time the hook runs (duration: 01m 05s)
* 21:52 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 21:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
* 21:50 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 21:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
* 21:49 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 21:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[2] wikis to 1.36.0-wmf.22"
* 21:43 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
* 20:54 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:43 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
* 20:44 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 21:43 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 20:43 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 21:43 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 20:24 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 21:43 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 20:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.25  refs [[phab:T267418|T267418]]
* 21:43 bking@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 20:01 bstorm: restarting haproxy on dbproxy1018 to pick up new config file
* 21:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:56 mutante: removing mongodb PHP extension, config, package from mwdebug* hosts - [[phab:T180761|T180761]]
* 21:00 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
* 19:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.25/includes/DefaultSettings.php: {{Gerrit|59866730ea7534db9e47ea308ba2a3c1807d5f11}}: Revert "Provide native support to dismiss sitenotice in core." ([[phab:T271365|T271365]]; [[phab:T259903|T259903]]; 3/3) (duration: 01m 03s)
* 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:55 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.25/resources/: {{Gerrit|59866730ea7534db9e47ea308ba2a3c1807d5f11}}: Revert "Provide native support to dismiss sitenotice in core." ([[phab:T271365|T271365]]; [[phab:T259903|T259903]]; 2/3) (duration: 01m 05s)
* 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.25/includes/skins/: {{Gerrit|59866730ea7534db9e47ea308ba2a3c1807d5f11}}: Revert "Provide native support to dismiss sitenotice in core." ([[phab:T271365|T271365]]; [[phab:T259903|T259903]]; 1/3) (duration: 01m 04s)
* 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32332 and previous config saved to /var/cache/conftool/dbconfig/20220809-205548-ladsgroup.json
* 19:10 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: {{Gerrit|8a849d90277b1e13154e87d812d64efc3a99c00a}}: throttle: Cleanup outdated rules (duration: 01m 06s)
* 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1014.eqiad.wmnet
* 19:05 urbanecm@deploy1001: Synchronized wmf-config/Wikibase.php: {{Gerrit|90f98c6a049c69b70ab9cb78eb986f1ecf4ffc9b}}: Use DisabledSpecialPage to disable ItemDisambiguation ([[phab:T271389|T271389]]) (duration: 01m 08s)
* 20:51 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1014.eqiad.wmnet
* 18:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1027.eqiad.wmnet with reason: REIMAGE
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32331 and previous config saved to /var/cache/conftool/dbconfig/20220809-204042-ladsgroup.json
* 18:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32330 and previous config saved to /var/cache/conftool/dbconfig/20220809-202536-ladsgroup.json
* 18:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1027.eqiad.wmnet with reason: REIMAGE
* 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32329 and previous config saved to /var/cache/conftool/dbconfig/20220809-201030-ladsgroup.json
* 18:34 volans@deploy1001: Finished deploy [homer/deploy@fe7acbc]: Release v0.2.6 (duration: 04m 25s)
* 19:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 18:30 volans@deploy1001: Started deploy [homer/deploy@fe7acbc]: Release v0.2.6
* 19:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 16:50 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 50s)
* 19:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 16:47 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 19:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 16:46 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 05s)
* 19:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 16:44 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 19:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 16:44 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 08s)
* 19:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 16:44 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 19:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 16:16 moritzm: installing xerces-c security updates on Buster
* 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 15:53 moritzm: installing xorg-server security updates on stretch
* 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
* 19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
* 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
* 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:14 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:11 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1072.eqiad.wmnet with OS bullseye
* 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:29 vgutierrez: test trafficserver 9.1.2-1wm2 in cp6016 - [[phab:T309651|T309651]]
* 15:09 moritzm: installing libmaxminddb security updates on stretch
* 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
* 15:06 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 03m 34s)
* 17:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
* 15:03 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1072.eqiad.wmnet with OS bullseye
* 15:01 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 25s)
* 16:54 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 14:59 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 14:58 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 03s)
* 16:53 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 14:58 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 16:53 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 14:58 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 04s)
* 16:26 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:57 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:26 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:56 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1069.eqiad.wmnet with OS bullseye
* 14:54 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
* 14:54 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:42 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
* 14:54 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 02m 05s)
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:52 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:51 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 00m 04s)
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:51 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels
* 15:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1069.eqiad.wmnet with OS bullseye
* 14:51 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh cloudweb2001-dev (duration: 01m 53s)
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1058.eqiad.wmnet with OS bullseye
* 14:49 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh cloudweb2001-dev
* 15:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
* 14:46 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 03m 39s)
* 15:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
* 14:42 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 00m 04s)
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:42 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels
* 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:40 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* m: finished running 'homer "status:active" commit "netmon: Add the netmon1003 host as a syslog destination"' in the cumin1001 host. Homer reported no errors.
* 14:33 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:32 jayme: imported calico 3.17.0-2 to component/calico-future stretch-wikimedia
* 14:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1058.eqiad.wmnet with OS bullseye
* 14:32 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:28 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
* 14:08 moritzm: installing sqlite3 security updates on buster
* 13:57 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:42 _joe_: running puppet on logstash1007, elasticsearch oomkilled
* 13:57 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:24 marostegui: Deploy schema change on s2 primary master with replication [[phab:T270053|T270053]]
* m: Add the new netmon1003 host as a syslog destination in homer templates/common/system.conf https://gerrit.wikimedia.org/r/c/operations/homer/public/+/819124
* 12:21 kart_: EU-Mid day backport window done.
* m: Successfully ran '# run-puppet-merge' in the netmon1002 and netmon1003 hosts.
* 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:654734{{!}}Enable ContentTranslation in Tsonga Wikipedia as a default tool (T271204)]] (duration: 01m 09s)
* m: Running '# run-puppet-agent' in the netmon1003 host
* 12:02 XioNoX: push "Allow specific flows from 172.16/12 to prod + default permit" - [[phab:T209082|T209082]]
* m: Running '# run-puppet-agent' in the netmon1002 host
* 10:51 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 13:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 10:06 godog: bounce apache on prometheus codfw