You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 01m 36s))
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298555)', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json)
 
(370 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-04-04 ==
== 2022-05-21 ==
* 14:47 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 01m 36s)
* 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json
* 14:45 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch
* 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28207 and previous config saved to /var/cache/conftool/dbconfig/20220521-010626-ladsgroup.json
* 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28206 and previous config saved to /var/cache/conftool/dbconfig/20220521-001014-ladsgroup.json
* 00:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 00:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance


== 2021-04-03 ==
== 2022-05-20 ==
* 19:20 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 11s)
* 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28205 and previous config saved to /var/cache/conftool/dbconfig/20220520-224558-ladsgroup.json
* 19:18 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28204 and previous config saved to /var/cache/conftool/dbconfig/20220520-223054-ladsgroup.json
* 17:30 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 35s)
* 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 17:26 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
* 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:44 elukey: power reset for ms-be2028 - not reachable via ssh, no tty available via mgmt console, NMI unrecoverable errors logged in iLo's system logs
* 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28203 and previous config saved to /var/cache/conftool/dbconfig/20220520-221550-ladsgroup.json
* 15:35 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 18s)
* 22:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
* 15:33 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
* 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28202 and previous config saved to /var/cache/conftool/dbconfig/20220520-220046-ladsgroup.json
* 15:12 andrew@deploy1002: Finished deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch (duration: 11m 51s)
* 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:00 andrew@deploy1002: Started deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch
* 21:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 05:38 andrew@deploy1002: Finished deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 05s)
* 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28201 and previous config saved to /var/cache/conftool/dbconfig/20220520-215514-ladsgroup.json
* 05:35 andrew@deploy1002: Started deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch
* 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 21:50 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 21:38 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye
* 21:37 mutante: correction: mistake was to use FQDN [[phab:T307142|T307142]]
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError [[phab:T307142|T307142]]
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError
* 21:34 mutante: reimaging gitlab1004 (insetup) to test partman recipe from gerrit:793534 - [[phab:T307142|T307142]]
* 21:34 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 21:33 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28198 and previous config saved to /var/cache/conftool/dbconfig/20220520-190633-ladsgroup.json
* 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 18:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:55 mutante: [mwmaint1002:~] $ sudo mwscript initSiteStats.php --wiki=kcgwiki --update  (to update statistics for latest wikipedia kcg) [[phab:T305281|T305281]]
* 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5003.eqsin.wmnet with OS bullseye
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
* 16:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 16:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:37 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti5003.eqsin.wmnet with OS bullseye
* 16:33 robh: troubleshooting ganeti5003 ipmi failure via [[phab:T308211|T308211]]
* 16:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 16:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
* 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:09 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
* 16:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
* 16:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2069.codfw.wmnet with OS bullseye
* 15:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
* 15:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2068.codfw.wmnet with OS bullseye
* 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
* 15:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
* 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2069.codfw.wmnet with OS bullseye
* 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
* 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS bullseye
* 15:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1118 T', diff saved to https://phabricator.wikimedia.org/P28196 and previous config saved to /var/cache/conftool/dbconfig/20220520-151407-ladsgroup.json
* 15:11 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
* 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28195 and previous config saved to /var/cache/conftool/dbconfig/20220520-150838-root.json
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28194 and previous config saved to /var/cache/conftool/dbconfig/20220520-145334-root.json
* 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2066.codfw.wmnet with OS bullseye
* 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
* 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28193 and previous config saved to /var/cache/conftool/dbconfig/20220520-144212-ladsgroup.json
* 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P28192 and previous config saved to /var/cache/conftool/dbconfig/20220520-144111-ladsgroup.json
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28191 and previous config saved to /var/cache/conftool/dbconfig/20220520-143830-root.json
* 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
* 14:28 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28190 and previous config saved to /var/cache/conftool/dbconfig/20220520-142327-root.json
* 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28189 and previous config saved to /var/cache/conftool/dbconfig/20220520-142032-ladsgroup.json
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28188 and previous config saved to /var/cache/conftool/dbconfig/20220520-141316-ladsgroup.json
* 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28187 and previous config saved to /var/cache/conftool/dbconfig/20220520-141308-ladsgroup.json
* 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS bullseye
* 14:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28186 and previous config saved to /var/cache/conftool/dbconfig/20220520-140823-root.json
* 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28185 and previous config saved to /var/cache/conftool/dbconfig/20220520-135350-ladsgroup.json
* 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28184 and previous config saved to /var/cache/conftool/dbconfig/20220520-135319-root.json
* 13:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P28183 and previous config saved to /var/cache/conftool/dbconfig/20220520-134515-ladsgroup.json
* 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 13:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
* 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 1%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28182 and previous config saved to /var/cache/conftool/dbconfig/20220520-133815-root.json
* 13:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: [[phab:T308459|T308459]]
* 13:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: [[phab:T308459|T308459]]
* 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-tls
* 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=varnish-fe
* 13:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-be
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28181 and previous config saved to /var/cache/conftool/dbconfig/20220520-132307-ladsgroup.json
* 13:15 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye
* 12:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye
* 12:42 mforns@deploy1002: Finished deploy [airflow-dags/analytics@51a203f]: (no justification provided) (duration: 00m 07s)
* 12:42 mforns@deploy1002: Started deploy [airflow-dags/analytics@51a203f]: (no justification provided)
* 12:37 moritzm: copy prometheus-mcrouter-exporter from buster-wikimedia to bullseye-wikimedia (needed for [[phab:T308214|T308214]])
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28180 and previous config saved to /var/cache/conftool/dbconfig/20220520-123045-ladsgroup.json
* 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28179 and previous config saved to /var/cache/conftool/dbconfig/20220520-123037-ladsgroup.json
* 12:23 Amir1: killed refreshlinks suggestion in 10160
* 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
* 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28178 and previous config saved to /var/cache/conftool/dbconfig/20220520-121116-ladsgroup.json
* 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:10 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
* 11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28177 and previous config saved to /var/cache/conftool/dbconfig/20220520-114234-ladsgroup.json
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28176 and previous config saved to /var/cache/conftool/dbconfig/20220520-114202-ladsgroup.json
* 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28175 and previous config saved to /var/cache/conftool/dbconfig/20220520-113207-ladsgroup.json
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28174 and previous config saved to /var/cache/conftool/dbconfig/20220520-112449-ladsgroup.json
* 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28173 and previous config saved to /var/cache/conftool/dbconfig/20220520-111239-ladsgroup.json
* 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
* 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
* 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:09 jynus: drop backupcheck users from m1>dbbackups
* 10:54 moritzm: uploaded cas 6.4.6.3-wmf11u1 to apt.wikimedia.org/bullseye
* 10:52 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
* 10:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
* 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793737{{!}}Revert read new on frwiki for templatelinks migration]] (duration: 00m 51s)
* 10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2063.codfw.wmnet with OS bullseye
* 09:39 volans@cumin1001: dbctl commit (dc=all): 'emergency depool', diff saved to https://phabricator.wikimedia.org/P28172 and previous config saved to /var/cache/conftool/dbconfig/20220520-093928-volans.json
* 09:34 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
* 09:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
* 09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2063.codfw.wmnet with OS bullseye
* 08:54 vgutierrez: re-enabling puppet  and repooling cp3060 - [[phab:T308797|T308797]] [[phab:T243167|T243167]]
* 08:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2062.codfw.wmnet with OS bullseye
* 08:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
* 08:09 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P28171 and previous config saved to /var/cache/conftool/dbconfig/20220520-080719-root.json
* 07:53 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2062.codfw.wmnet with OS bullseye
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P28170 and previous config saved to /var/cache/conftool/dbconfig/20220520-075215-root.json
* 07:52 jayme: imported kubeconform 0.4.13-1 to buster-,bullseye-wikimedia - [[phab:T306165|T306165]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P28169 and previous config saved to /var/cache/conftool/dbconfig/20220520-073712-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P28168 and previous config saved to /var/cache/conftool/dbconfig/20220520-072208-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P28167 and previous config saved to /var/cache/conftool/dbconfig/20220520-070704-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P28166 and previous config saved to /var/cache/conftool/dbconfig/20220520-065200-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P28164 and previous config saved to /var/cache/conftool/dbconfig/20220520-063656-root.json
* 06:03 moritzm: racadm racreset on ganeti5003
* 05:09 marostegui: dbmaint s1@eqiad [[phab:T298554|T298554]]
* 01:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28162 and previous config saved to /var/cache/conftool/dbconfig/20220520-010743-ladsgroup.json
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28161 and previous config saved to /var/cache/conftool/dbconfig/20220520-005237-ladsgroup.json
* 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon1003.wikimedia.org with OS bullseye
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28160 and previous config saved to /var/cache/conftool/dbconfig/20220520-003732-ladsgroup.json
* 00:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
* 00:29 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
* 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
* 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28159 and previous config saved to /var/cache/conftool/dbconfig/20220520-002227-ladsgroup.json


== 2021-04-02 ==
== 2022-05-19 ==
* 22:31 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 23:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netmon1003.wikimedia.org with OS bullseye
* 22:31 bstorm@cumin1001: Added views for new wiki: trvwiki [[phab:T276246|T276246]]
* 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
* 22:08 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 22:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:08 mutante: pooled mw2395,mw2396 as API appservers running on new hardware
* 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[5-6].codfw.wmnet
* 22:07 robh: cp3060 idrac interface frozen, rebooted via power outlet control on [[phab:T243167|T243167]]
* 21:58 legoktm: legoktm@lists1002:~$ time sudo mailman-web rebuild_index
* 20:49 thcipriani: UTC late deploys done
* 21:56 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[5-6].codfw.wmnet
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw239[5-6].codfw.wmnet
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:48 mutante: mw2395, mw2396 - reboot - becoming API servers
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[0-4].codfw.wmnet
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:42 mutante: pooled 12 brand-new codfw appservers running on new hardware generation
* 20:40 bking@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:793128{{!}}zhwikiversity: Optimize logo per commons files (T308620)]] (duration: 00m 51s)
* 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw238[5-9].codfw.wmnet
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2384.codfw.wmnet
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2383.codfw.wmnet
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
* 20:34 bking@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:792985{{!}}zhwikiversity: Declare commons files for logo and its variant (T308620)]] (duration: 00m 50s)
* 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
* 20:33 bking@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:792985{{!}}zhwikiversity: Declare commons files for logo and its variant (T308620)]] (duration: 00m 53s)
* 21:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
* 20:24 bking@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791734{{!}}bnwikivoyage: Set $wgRelatedArticlesUseCirrusSearch to true on bnwikivoyage (T307904)]] (duration: 00m 50s)
* 21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
* 20:21 robh: ganeti5003 updating firmware via [[phab:T308211|T308211]]
* 21:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[0-4].codfw.wmnet
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:34 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw238[3-9].codfw.wmnet
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:28 legoktm: imported python-xapian-haystack 2.1.0-6~wmf1 on apt1001 ([[phab:T278717|T278717]])
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2394.codfw.wmnet
* 19:59 damilare: payments-wiki from {{Gerrit|464e3b0e}} to {{Gerrit|592c6d34}}
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2393.codfw.wmnet
* 19:58 inflatador: bking@relforge1004: banned relforge1003 from main and alpha clusters in preparation for reimage [[phab:T308770|T308770]]
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2392.codfw.wmnet
* 19:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2391.codfw.wmnet
* 19:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2390.codfw.wmnet
* 19:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2389.codfw.wmnet
* 19:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2388.codfw.wmnet
* 19:05 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2387.codfw.wmnet
* 19:01 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2386.codfw.wmnet
* 18:49 ryankemper: [WDQS Deploy] `Unknown` status resolved following deploy of https://gerrit.wikimedia.org/r/793530 ; wdqs categories monitoring is healthy again. We're done here
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2385.codfw.wmnet
* 18:45 ryankemper: [WDQS Deploy] Deployed https://gerrit.wikimedia.org/r/793530; ran puppet agent across wdqs* and just kicked off a re-check of the NRPE alerts. We'll see if that clears the Unknown state up
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2384.codfw.wmnet
* 18:29 ryankemper: [WDQS Deploy] Okay, so a recent refactor changed where the `check_categories.py` lives. Previously it was `/usr/lib/nagios/plugins/check_categories.py` and now it's `/usr/local/lib/nagios/plugins/check_categories.py`. So https://gerrit.wikimedia.org/r/793530 should fix things now
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2383.codfw.wmnet
* 18:18 ryankemper: [WDQS Deploy] Traced the failure back to https://gerrit.wikimedia.org/r/c/operations/puppet/+/792700 presumably; trying to see what we can do to fix up the patch without having to revert it since it touches stuff besides query service
* 21:19 mutante: generating mcrouter certs for mw2395 through mw2404  ([[phab:T278396|T278396]])
* 17:55 ryankemper: [WDQS Deploy] Slight amendment to the above; we're seeing status `Unknown` for `Categories endpoint` and `Categories update lag`. They've been warning for ~24h so it didn't surface following the deploy, but looking into that now
* 21:07 mutante: mw2383 through mw2394 - 'uptime && scap pull' via ssh -C (not cumin because it needs to run as non-root)
* 17:51 ryankemper: [[phab:T306899|T306899]] Rolled `wdqs` and `wcqs` deploys to adjust logging settings. Hoping this gives us more visibility on the 500 errors WCQS users have been experiencing.
* 20:58 mutante: mw238* - scap pull via cumin not possible because it doesnt work as root
* 17:50 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 20:50 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: tweak to affinity group options (duration: 03m 39s)
* 17:30 ryankemper: [WCQS Deploy] Successful test query placed on commons-query.wikimedia.org, there's no relevant criticals in Icinga, and Grafana looks good. WCQS deploy complete
* 20:46 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: tweak to affinity group options
* 17:30 ryankemper: [WCQS Deploy] Restarted `wcqs-updater` across all hosts: `sudo -E cumin 'A:wcqs-public' 'systemctl restart wcqs-updater'`
* 20:44 mutante: mw2385 through mw2394 - serial rebooting
* 17:29 ryankemper: [WCQS Deploy] Tests looked good following deploy of `0.3.111` to canary `wcqs1002.eqiad.wmnet`; proceeded to rest of fleet
* 20:43 mutante: mw2384 reboot
* 17:29 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a493d7f] (wcqs): Deploy 0.3.111 to WCQS (duration: 03m 03s)
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
* 17:26 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a493d7f] (wcqs): Deploy 0.3.111 to WCQS
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
* 17:26 ryankemper: [WCQS Deploy] Gearing up for deploy of wcqs `0.3.111`
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: new_install
* 17:24 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: new_install
* 17:24 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev (duration: 01m 47s)
* 17:23 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 20:39 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev
* 17:22 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a493d7f]: 0.3.111 (duration: 08m 11s)
* 20:09 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 17:16 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.111` on canary `wdqs1003`; proceeding to rest of fleet
* 20:09 bstorm@cumin1001: Added views for new wiki: taywiki [[phab:T275836|T275836]]
* 17:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a493d7f]: 0.3.111
* 19:47 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 17:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.111`. Pre-deploy tests passing on canary `wdqs1003`
* 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
* 17:03 otto@deploy1002: Finished deploy [airflow-dags/analytics@95c1f50]: (no justification provided) (duration: 00m 21s)
* 19:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
* 17:03 otto@deploy1002: Started deploy [airflow-dags/analytics@95c1f50]: (no justification provided)
* 19:07 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 16:56 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@95c1f50]: (no justification provided) (duration: 00m 12s)
* 19:07 bstorm@cumin1001: Added views for new wiki: mnwwiktionary [[phab:T276126|T276126]]
* 16:55 otto@deploy1002: Started deploy [airflow-dags/analytics_test@95c1f50]: (no justification provided)
* 18:44 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 16:37 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
* 18:44 mutante: [puppetmaster1001:~] $ sudo puppet node deactivate mw2247.codfw.wmnet
* 16:35 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 18:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
* 16:31 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 18:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
* 16:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
* 17:57 legoktm: upgraded mailman3 python3-django-postorius on lists1002
* 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28155 and previous config saved to /var/cache/conftool/dbconfig/20220519-161022-ladsgroup.json
* 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28154 and previous config saved to /var/cache/conftool/dbconfig/20220519-161014-ladsgroup.json
* 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 15:41 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 15:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 14:35 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw133[7-8].eqiad.wmnet
* 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
* 14:34 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=videoscaler,name=mw133[5-6].eqiad.wmnet
* 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P28153 and previous config saved to /var/cache/conftool/dbconfig/20220519-155509-ladsgroup.json
* 14:32 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw133[5-6].eqiad.wmnet
* 15:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit2002.wikimedia.org with OS bullseye
* 14:31 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw133[7-8].eqiad.wmnet
* 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28152 and previous config saved to /var/cache/conftool/dbconfig/20220519-154124-ladsgroup.json
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
* 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P28151 and previous config saved to /var/cache/conftool/dbconfig/20220519-154003-ladsgroup.json
* 14:29 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1111.eqiad.wmnet
* 15:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
* 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:20 Urbanecm: Start server-side upload for 3 video files ([[phab:T279060|T279060]], [[phab:T279061|T279061]], [[phab:T279062|T279062]])
* 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P28150 and previous config saved to /var/cache/conftool/dbconfig/20220519-152618-ladsgroup.json
* 14:09 Urbanecm: Start server-side upload for 3 video files ([[phab:T279138|T279138]], [[phab:T279137|T279137]], [[phab:T279136|T279136]])
* 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28149 and previous config saved to /var/cache/conftool/dbconfig/20220519-152457-ladsgroup.json
* 13:42 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.37
* 15:24 ariel@deploy1002: Finished deploy [dumps/dumps@cd30939]: use dbgroupdefault for most jobs (duration: 00m 04s)
* 13:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
* 15:24 ariel@deploy1002: Started deploy [dumps/dumps@cd30939]: use dbgroupdefault for most jobs
* 13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
* 15:23 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:11 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/load.php: [[phab:T278579|T278579]] (duration: 00m 58s)
* 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 13:10 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/OutputHandler.php: [[phab:T278579|T278579]] (duration: 00m 57s)
* 15:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 13:08 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/MediaWiki.php: [[phab:T278579|T278579]] (duration: 00m 58s)
* 15:19 oblivian@deploy1002: Synchronized README: null sync-file to verify the switch to the deployment group (duration: 00m 50s)
* 11:46 Urbanecm: correction: Start server-side upload for 3 video files ([[phab:T279079|T279079]], [[phab:T279080|T279080]], [[phab:T279104|T279104]])
* 15:14 _joe_: deploy1002:/srv/mediawiki-staging $ find . -group wikidev -print0 {{!}} sudo xargs -0 -n 100 chgrp -h deployment --
* 11:45 Urbanecm: Start server-side upload for 3 images ([[phab:T279079|T279079]], [[phab:T279080|T279080]], [[phab:T279104|T279104]])
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P28148 and previous config saved to /var/cache/conftool/dbconfig/20220519-151113-ladsgroup.json
* 10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
* 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
* 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
* 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:00 _joe_: oblivian@deploy2002:/srv/mediawiki-staging $ sudo find . -group wikidev -exec chgrp wikidev "<nowiki>{</nowiki><nowiki>}</nowiki>" \;
* 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:00 papaul: powerdown gerrit2002 for relocation
* 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
* 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28147 and previous config saved to /var/cache/conftool/dbconfig/20220519-145608-ladsgroup.json
* 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
* 10:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback group0 wikis to 1.36.0-wmf.36 - [[phab:T278343|T278343]]
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28145 and previous config saved to /var/cache/conftool/dbconfig/20220519-144021-ladsgroup.json
* 09:45 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 and group2 wikis to 1.36.0-wmf.36 - [[phab:T278343|T278343]]
* 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 09:44 hashar@deploy1002: sync-wikiversions aborted: Revert group1 and group2 wikis to 1.36.0-wmf.36 (duration: 00m 01s)
* 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 09:06 dcausse: remove dumps from wdqs1009 to free disk space
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28144 and previous config saved to /var/cache/conftool/dbconfig/20220519-144013-ladsgroup.json
* 07:33 effie: powercycle an-worker1080
* 14:36 tgr: EU mid-day deploys done
* 07:28 elukey: manual fix for an-worker1080's interface in netbox (xe-4/0/11), moved by mistake to public-1b
* 14:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793395{{!}}GrothExperiments: Enable Add Link frontend on tier 3 wikis (T304542)]] (duration: 00m 50s)
* 03:54 dwisehaupt: replication user on fundraising db set to require ssl for connections at the mysql user level. db updated on frdb1004 and verified on a set of hosts
* 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:16 dwisehaupt: replication user on payments db set to require ssl for connections at the mysql user level. db updated on payments1001 and verified on a set of hosts
* 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
* 14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P28143 and previous config saved to /var/cache/conftool/dbconfig/20220519-142507-ladsgroup.json
* 14:23 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:22 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
* 14:20 tgr@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:793119{{!}}zhwikiquote: Optimize logo per commons files (T308620)]] (duration: 00m 50s)
* 14:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28142 and previous config saved to /var/cache/conftool/dbconfig/20220519-141453-marostegui.json
* 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P28141 and previous config saved to /var/cache/conftool/dbconfig/20220519-141001-ladsgroup.json
* 14:09 jayme: systemctl restart rsyslog on kubernetes1011,kubestage1003
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
* 13:58 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791797{{!}}votewiki: Change wgLanguageCode to zh for May 2022 zhwiki admin election (T308397)]] (duration: 00m 52s)
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28140 and previous config saved to /var/cache/conftool/dbconfig/20220519-135632-marostegui.json
* 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28139 and previous config saved to /var/cache/conftool/dbconfig/20220519-135624-marostegui.json
* 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28138 and previous config saved to /var/cache/conftool/dbconfig/20220519-135456-ladsgroup.json
* 13:52 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P28137 and previous config saved to /var/cache/conftool/dbconfig/20220519-134119-marostegui.json
* 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P28136 and previous config saved to /var/cache/conftool/dbconfig/20220519-132614-marostegui.json
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:21 jnuche@deploy1002: Synchronized php-1.39.0-wmf.12/extensions/FileImporter/src/Services/WikiRevisionFactory.php: Backport: [[gerrit:793157{{!}}Revert "Fix bogus user object creation in WikiRevisionFactory" (T308691)]] (duration: 00m 53s)
* 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28135 and previous config saved to /var/cache/conftool/dbconfig/20220519-131108-marostegui.json
* 13:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
* 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28134 and previous config saved to /var/cache/conftool/dbconfig/20220519-125442-ladsgroup.json
* 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28133 and previous config saved to /var/cache/conftool/dbconfig/20220519-125434-ladsgroup.json
* 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28131 and previous config saved to /var/cache/conftool/dbconfig/20220519-124456-marostegui.json
* 12:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 12:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 12:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
* 12:40 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5002.eqsin.wmnet to ganeti01.svc.eqsin.wmnet
* 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P28130 and previous config saved to /var/cache/conftool/dbconfig/20220519-123927-ladsgroup.json
* 12:39 root@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5002.eqsin.wmnet to ganeti01.svc.eqsin.wmnet
* 12:37 marostegui: dbmaint s1@eqiad [[phab:T300775|T300775]]
* 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28129 and previous config saved to /var/cache/conftool/dbconfig/20220519-123227-ladsgroup.json
* 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28128 and previous config saved to /var/cache/conftool/dbconfig/20220519-123219-ladsgroup.json
* 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P28127 and previous config saved to /var/cache/conftool/dbconfig/20220519-122422-ladsgroup.json
* 12:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 12:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P28126 and previous config saved to /var/cache/conftool/dbconfig/20220519-121714-ladsgroup.json
* 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28125 and previous config saved to /var/cache/conftool/dbconfig/20220519-120917-ladsgroup.json
* 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28124 and previous config saved to /var/cache/conftool/dbconfig/20220519-120521-marostegui.json
* 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P28123 and previous config saved to /var/cache/conftool/dbconfig/20220519-120209-ladsgroup.json
* 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 11:59 marostegui: Failover m5 master [[phab:T307673|T307673]]
* 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28122 and previous config saved to /var/cache/conftool/dbconfig/20220519-115303-ladsgroup.json
* 11:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 11:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28121 and previous config saved to /var/cache/conftool/dbconfig/20220519-115255-ladsgroup.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P28120 and previous config saved to /var/cache/conftool/dbconfig/20220519-115016-marostegui.json
* 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28119 and previous config saved to /var/cache/conftool/dbconfig/20220519-114703-ladsgroup.json
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P28118 and previous config saved to /var/cache/conftool/dbconfig/20220519-113750-ladsgroup.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P28117 and previous config saved to /var/cache/conftool/dbconfig/20220519-113511-marostegui.json
* 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 11:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P28116 and previous config saved to /var/cache/conftool/dbconfig/20220519-112245-ladsgroup.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28115 and previous config saved to /var/cache/conftool/dbconfig/20220519-112006-marostegui.json
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28114 and previous config saved to /var/cache/conftool/dbconfig/20220519-110740-ladsgroup.json
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28113 and previous config saved to /var/cache/conftool/dbconfig/20220519-105637-marostegui.json
* 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28112 and previous config saved to /var/cache/conftool/dbconfig/20220519-105624-marostegui.json
* 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P28110 and previous config saved to /var/cache/conftool/dbconfig/20220519-104119-marostegui.json
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P28109 and previous config saved to /var/cache/conftool/dbconfig/20220519-102613-marostegui.json
* 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
* 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28108 and previous config saved to /var/cache/conftool/dbconfig/20220519-101841-ladsgroup.json
* 10:18 marostegui: Failover m3 master [[phab:T307673|T307673]]
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28107 and previous config saved to /var/cache/conftool/dbconfig/20220519-101108-marostegui.json
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28106 and previous config saved to /var/cache/conftool/dbconfig/20220519-100725-ladsgroup.json
* 10:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P28105 and previous config saved to /var/cache/conftool/dbconfig/20220519-100336-ladsgroup.json
* 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5002.eqsin.wmnet with OS bullseye
* 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
* 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
* 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28104 and previous config saved to /var/cache/conftool/dbconfig/20220519-095311-ladsgroup.json
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P28103 and previous config saved to /var/cache/conftool/dbconfig/20220519-094831-ladsgroup.json
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28102 and previous config saved to /var/cache/conftool/dbconfig/20220519-094607-marostegui.json
* 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28101 and previous config saved to /var/cache/conftool/dbconfig/20220519-094559-marostegui.json
* 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5002.eqsin.wmnet with reason: host reimage
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P28100 and previous config saved to /var/cache/conftool/dbconfig/20220519-093806-ladsgroup.json
* 09:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5002.eqsin.wmnet with reason: host reimage
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28099 and previous config saved to /var/cache/conftool/dbconfig/20220519-093326-ladsgroup.json
* 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P28098 and previous config saved to /var/cache/conftool/dbconfig/20220519-093054-marostegui.json
* 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P28097 and previous config saved to /var/cache/conftool/dbconfig/20220519-092301-ladsgroup.json
* 09:20 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
* 09:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P28096 and previous config saved to /var/cache/conftool/dbconfig/20220519-091549-marostegui.json
* 09:15 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
* 09:11 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
* 09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2061.codfw.wmnet with OS bullseye
* 09:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
* 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28095 and previous config saved to /var/cache/conftool/dbconfig/20220519-090756-ladsgroup.json
* 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
* 09:03 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
* 09:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5002.eqsin.wmnet with OS bullseye
* 09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
* 09:01 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28094 and previous config saved to /var/cache/conftool/dbconfig/20220519-090044-marostegui.json
* 08:55 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
* 08:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2061.codfw.wmnet with reason: host reimage
* 08:53 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
* 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28093 and previous config saved to /var/cache/conftool/dbconfig/20220519-084956-ladsgroup.json
* 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28092 and previous config saved to /var/cache/conftool/dbconfig/20220519-084942-ladsgroup.json
* 08:49 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2061.codfw.wmnet with reason: host reimage
* 08:48 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
* 08:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2061.codfw.wmnet with OS bullseye
* 08:46 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
* 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
* 08:42 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
* 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1001.eqiad.wmnet
* 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:39 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet
* 08:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2061.codfw.wmnet with OS bullseye
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28091 and previous config saved to /var/cache/conftool/dbconfig/20220519-083609-marostegui.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 08:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28090 and previous config saved to /var/cache/conftool/dbconfig/20220519-083601-marostegui.json
* 08:34 marostegui: Failover m2 master [[phab:T307673|T307673]]
* 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P28089 and previous config saved to /var/cache/conftool/dbconfig/20220519-083437-ladsgroup.json
* 08:34 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
* 08:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28088 and previous config saved to /var/cache/conftool/dbconfig/20220519-083311-ladsgroup.json
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28087 and previous config saved to /var/cache/conftool/dbconfig/20220519-083303-ladsgroup.json
* 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts webperf1001.eqiad.wmnet
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2001.codfw.wmnet
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P28086 and previous config saved to /var/cache/conftool/dbconfig/20220519-082056-marostegui.json
* 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P28085 and previous config saved to /var/cache/conftool/dbconfig/20220519-081932-ladsgroup.json
* 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P28084 and previous config saved to /var/cache/conftool/dbconfig/20220519-081758-ladsgroup.json
* 08:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts webperf2001.codfw.wmnet
* 08:06 marostegui: Failover m1 master [[phab:T307673|T307673]]
* 08:06 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2061.codfw.wmnet with OS bullseye
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P28083 and previous config saved to /var/cache/conftool/dbconfig/20220519-080551-marostegui.json
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28082 and previous config saved to /var/cache/conftool/dbconfig/20220519-080427-ladsgroup.json
* 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P28081 and previous config saved to /var/cache/conftool/dbconfig/20220519-080253-ladsgroup.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28080 and previous config saved to /var/cache/conftool/dbconfig/20220519-075046-marostegui.json
* 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
* 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28079 and previous config saved to /var/cache/conftool/dbconfig/20220519-074748-ladsgroup.json
* 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28078 and previous config saved to /var/cache/conftool/dbconfig/20220519-074538-ladsgroup.json
* 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 07:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
* 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@8615678]: Fix links to non-existent Grafana graphs - [[phab:T307405|T307405]] (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@8615678]: Fix links to non-existent Grafana graphs - [[phab:T307405|T307405]]
* 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28077 and previous config saved to /var/cache/conftool/dbconfig/20220519-072007-ladsgroup.json
* 07:18 marostegui: dbmaint s1@eqiad [[phab:T300381|T300381]]
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792559{{!}}Enable Section Translation in as, gu, kn, mk and, mr Wikipedias (T304828)]] (duration: 00m 53s)
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P28076 and previous config saved to /var/cache/conftool/dbconfig/20220519-070533-marostegui.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P28075 and previous config saved to /var/cache/conftool/dbconfig/20220519-070502-ladsgroup.json
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P28074 and previous config saved to /var/cache/conftool/dbconfig/20220519-064957-ladsgroup.json
* 06:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 06:42 marostegui: dbmaint s1@eqiad [[phab:T298557|T298557]]
* 06:41 marostegui: dbmaint s6@eqiad [[phab:T298557|T298557]]
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28073 and previous config saved to /var/cache/conftool/dbconfig/20220519-064108-ladsgroup.json
* 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28072 and previous config saved to /var/cache/conftool/dbconfig/20220519-064100-ladsgroup.json
* 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28071 and previous config saved to /var/cache/conftool/dbconfig/20220519-063452-ladsgroup.json
* 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P28070 and previous config saved to /var/cache/conftool/dbconfig/20220519-062555-ladsgroup.json
* 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28069 and previous config saved to /var/cache/conftool/dbconfig/20220519-061907-ladsgroup.json
* 06:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28068 and previous config saved to /var/cache/conftool/dbconfig/20220519-061859-ladsgroup.json
* 06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1118.eqiad.wmnet with reason: Maint
* 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1118.eqiad.wmnet with reason: Maint
* 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P28067 and previous config saved to /var/cache/conftool/dbconfig/20220519-061050-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1118 [[phab:T301312|T301312]]', diff saved to https://phabricator.wikimedia.org/P28066 and previous config saved to /var/cache/conftool/dbconfig/20220519-060542-ladsgroup.json
* 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P28065 and previous config saved to /var/cache/conftool/dbconfig/20220519-060354-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write [[phab:T301312|T301312]]', diff saved to https://phabricator.wikimedia.org/P28064 and previous config saved to /var/cache/conftool/dbconfig/20220519-060119-ladsgroup.json
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - [[phab:T301312|T301312]]', diff saved to https://phabricator.wikimedia.org/P28063 and previous config saved to /var/cache/conftool/dbconfig/20220519-060023-ladsgroup.json
* 06:00 Amir1: Starting s1 eqiad failover from db1118 to db1163 - [[phab:T301312|T301312]]
* 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28062 and previous config saved to /var/cache/conftool/dbconfig/20220519-055545-ladsgroup.json
* 05:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P28061 and previous config saved to /var/cache/conftool/dbconfig/20220519-054849-ladsgroup.json
* 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28060 and previous config saved to /var/cache/conftool/dbconfig/20220519-053344-ladsgroup.json
* 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 [[phab:T301312|T301312]]', diff saved to https://phabricator.wikimedia.org/P28059 and previous config saved to /var/cache/conftool/dbconfig/20220519-052517-ladsgroup.json
* 05:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s1 [[phab:T301312|T301312]]
* 05:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s1 [[phab:T301312|T301312]]
* 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28058 and previous config saved to /var/cache/conftool/dbconfig/20220519-052303-ladsgroup.json
* 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P28057 and previous config saved to /var/cache/conftool/dbconfig/20220519-052218-ladsgroup.json
* 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28056 and previous config saved to /var/cache/conftool/dbconfig/20220519-052047-ladsgroup.json
* 05:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 05:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28055 and previous config saved to /var/cache/conftool/dbconfig/20220519-052039-ladsgroup.json
* 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28054 and previous config saved to /var/cache/conftool/dbconfig/20220519-051702-ladsgroup.json
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28053 and previous config saved to /var/cache/conftool/dbconfig/20220519-051654-ladsgroup.json
* 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28052 and previous config saved to /var/cache/conftool/dbconfig/20220519-050746-ladsgroup.json
* 05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28051 and previous config saved to /var/cache/conftool/dbconfig/20220519-050738-ladsgroup.json
* 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28050 and previous config saved to /var/cache/conftool/dbconfig/20220519-050412-ladsgroup.json
* 05:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 05:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28049 and previous config saved to /var/cache/conftool/dbconfig/20220519-050404-ladsgroup.json
* 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P28048 and previous config saved to /var/cache/conftool/dbconfig/20220519-050149-ladsgroup.json
* 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28047 and previous config saved to /var/cache/conftool/dbconfig/20220519-045412-ladsgroup.json
* 04:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 04:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28046 and previous config saved to /var/cache/conftool/dbconfig/20220519-044813-ladsgroup.json
* 04:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 04:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28045 and previous config saved to /var/cache/conftool/dbconfig/20220519-044805-ladsgroup.json
* 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P28044 and previous config saved to /var/cache/conftool/dbconfig/20220519-044644-ladsgroup.json
* 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28043 and previous config saved to /var/cache/conftool/dbconfig/20220519-043858-ladsgroup.json
* 04:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 04:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 04:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 04:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 04:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 04:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28042 and previous config saved to /var/cache/conftool/dbconfig/20220519-043139-ladsgroup.json
* 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28041 and previous config saved to /var/cache/conftool/dbconfig/20220519-043110-ladsgroup.json
* 04:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 04:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 04:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 04:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28040 and previous config saved to /var/cache/conftool/dbconfig/20220519-043057-ladsgroup.json
* 04:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 04:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28039 and previous config saved to /var/cache/conftool/dbconfig/20220519-041427-ladsgroup.json
* 04:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 04:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28038 and previous config saved to /var/cache/conftool/dbconfig/20220519-041418-ladsgroup.json
* 04:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 04:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28037 and previous config saved to /var/cache/conftool/dbconfig/20220519-041410-ladsgroup.json
* 04:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 04:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P28036 and previous config saved to /var/cache/conftool/dbconfig/20220519-035905-ladsgroup.json
* 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P28035 and previous config saved to /var/cache/conftool/dbconfig/20220519-035820-root.json
* 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28034 and previous config saved to /var/cache/conftool/dbconfig/20220519-035754-ladsgroup.json
* 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Maint done', diff saved to https://phabricator.wikimedia.org/P28033 and previous config saved to /var/cache/conftool/dbconfig/20220519-035730-root.json
* 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P28032 and previous config saved to /var/cache/conftool/dbconfig/20220519-035726-root.json
* 03:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 03:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P28031 and previous config saved to /var/cache/conftool/dbconfig/20220519-034400-ladsgroup.json
* 03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P28030 and previous config saved to /var/cache/conftool/dbconfig/20220519-034222-root.json
* 03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 03:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 03:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 03:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28029 and previous config saved to /var/cache/conftool/dbconfig/20220519-032855-ladsgroup.json
* 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P28028 and previous config saved to /var/cache/conftool/dbconfig/20220519-032718-root.json
* 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28027 and previous config saved to /var/cache/conftool/dbconfig/20220519-031303-ladsgroup.json
* 03:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 03:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P28026 and previous config saved to /var/cache/conftool/dbconfig/20220519-031214-root.json
* 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28025 and previous config saved to /var/cache/conftool/dbconfig/20220519-030335-ladsgroup.json
* 03:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 03:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 03:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 03:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Maint done', diff saved to https://phabricator.wikimedia.org/P28024 and previous config saved to /var/cache/conftool/dbconfig/20220519-025710-root.json
* 02:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 02:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 02:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 02:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28023 and previous config saved to /var/cache/conftool/dbconfig/20220519-020532-ladsgroup.json
* 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P28022 and previous config saved to /var/cache/conftool/dbconfig/20220519-015026-ladsgroup.json
* 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P28021 and previous config saved to /var/cache/conftool/dbconfig/20220519-013521-ladsgroup.json
* 01:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 01:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28020 and previous config saved to /var/cache/conftool/dbconfig/20220519-012051-ladsgroup.json
* 01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28019 and previous config saved to /var/cache/conftool/dbconfig/20220519-012015-ladsgroup.json
* 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28018 and previous config saved to /var/cache/conftool/dbconfig/20220519-011143-ladsgroup.json
* 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 01:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P28017 and previous config saved to /var/cache/conftool/dbconfig/20220519-010546-ladsgroup.json
* 01:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 01:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 01:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 01:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 00:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 00:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 00:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28016 and previous config saved to /var/cache/conftool/dbconfig/20220519-005834-ladsgroup.json
* 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P28015 and previous config saved to /var/cache/conftool/dbconfig/20220519-005041-ladsgroup.json
* 00:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P28014 and previous config saved to /var/cache/conftool/dbconfig/20220519-004329-ladsgroup.json
* 00:37 ejegg: updated payments-wiki from {{Gerrit|d9d63a3d2c6}} to {{Gerrit|464e3b0e3310}}
* 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28013 and previous config saved to /var/cache/conftool/dbconfig/20220519-003536-ladsgroup.json
* 00:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P28012 and previous config saved to /var/cache/conftool/dbconfig/20220519-002824-ladsgroup.json
* 00:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28011 and previous config saved to /var/cache/conftool/dbconfig/20220519-001319-ladsgroup.json
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28010 and previous config saved to /var/cache/conftool/dbconfig/20220519-000423-ladsgroup.json
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance


== 2021-04-01 ==
== 2022-05-18 ==
* 23:32 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: [[gerrit:676350{{!}}Revert "Turn on glent m1 AB test"]] [[phab:T262612|T262612]] (duration: 00m 58s)
* 23:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:28 thcipriani: reset /srv/mediawiki-staging/php-1.36.0-wmf.37/extensions/TimedMediaHandler to {{Gerrit|1be781d}} (HEAD of wmf/1.36.0-wmf.37 -- from HEAD of master 49f417)
* 23:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:12 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Backport: Part III [[gerrit:676451
* 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28009 and previous config saved to /var/cache/conftool/dbconfig/20220518-235759-ladsgroup.json
* 23:53 mutante: webperf1001 - systemctl reset-failed
* 23:53 mutante: webperf1001/webperf2001 - re-enabling notifications in icinga that were disabled without comment (please don't do this, they keep being forgotten on a regular basis)
* 23:49 mutante: seaborgium - broken systemd state in Icinga since 23d - systemctl reset-failed
* 23:48 mutante: ms-be1063 - broken systemd state in Icinga since 19d - systemctl reset-failed
* 23:47 mutante: ms-be1054 - broken systemd state in Icinga since 19d - systemctl reset-failed
* 23:47 mutante: ms-be1036 - broken systemd state in Icinga since 15d - systemctl reset-failed
* 23:45 mutante: dumpsdata1002 - broken systemd state in Icinga since 23d - systemctl reset-failed
* 23:44 mutante: deploy2002 - broken systemd state in Icinga since 42d - systemctl reset-failed
* 23:43 mutante: an-db1002 - broken systemd state in Icinga since 48d - systemctl reset-failed
* 23:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after


== 2021-03-31 ==
== 2022-05-17 ==
* 23:34 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|bfc8f55196f57e43c0abc8a16d81cb3b390ac94a}}: Eliminate another php.getSetting() from Lua code (duration: 01m 09s)
* 23:36 ejegg: updated payments-wiki from {{Gerrit|590fac28}} to {{Gerrit|d9d63a3d}}
* 23:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|ad564a098f9174d76ff5c95adec20064ddde7bc9}}: Eliminate another php.getSetting() from Lua code (duration: 01m 10s)
* 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:12 jhuneidi@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:674698{{!}}Include private folder in restricted image (T276145)]] (duration: 01m 08s)
* 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27896 and previous config saved to /var/cache/conftool/dbconfig/20220517-222904-ladsgroup.json
* 23:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:668241{{!}}Use the new mediawiki logos]], part II ([[phab:T268230|T268230]]) (duration: 01m 11s)
* 22:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:03 ladsgroup@deploy1002: Synchronized static: [[gerrit:668241{{!}}Use the new mediawiki logos]], part I ([[phab:T268230|T268230]]) (duration: 01m 09s)
* 22:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:58 Urbanecm: Start server side upload for 3 files
* 22:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:01 Urbanecm: Server side upload of three video files ([[phab:T279011|T279011]], [[phab:T278956|T278956]], [[phab:T278955|T278955]])
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:01 eileen: civicrm revision changed from {{Gerrit|2fcea570bd}} to {{Gerrit|740e49d868}}, config revision is {{Gerrit|6779e3829a}}
* 22:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:00 dwisehaupt: shifted payments2003 to use gtid for mysql replication.
* 22:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:55 robh@cumin1001: START - Cookbook sre.dns.netbox
* 22:16 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: {{Gerrit|c2151b3}}: Update interwiki cache (duration: 00m 52s)
* 19:21 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]] (duration: 01m 08s)
* 22:15 urbanecm@deploy1002: Synchronized langlist: {{Gerrit|cd704d4f}}: langlist: add kcg language ([[phab:T305279|T305279]]) (duration: 00m 53s)
* 19:20 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P27895 and previous config saved to /var/cache/conftool/dbconfig/20220517-221359-ladsgroup.json
* 19:18 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P27894 and previous config saved to /var/cache/conftool/dbconfig/20220517-215854-ladsgroup.json
* 19:13 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 21:52 mutante: alert1001 - systemctl start certspotter (after alert that the unit was failed. happens sometimes)
* 19:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27893 and previous config saved to /var/cache/conftool/dbconfig/20220517-214349-ladsgroup.json
* 19:03 twentyafterfour@deploy1002: Synchronized php-1.36.0-wmf.37/includes/Revision/RevisionRecord.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/675875 to unblock train refs  [[phab:T278376|T278376]] [[phab:T278343|T278343]] (duration: 00m 58s)
* 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27892 and previous config saved to /var/cache/conftool/dbconfig/20220517-212530-ladsgroup.json
* 17:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36  refs [[phab:T278343|T278343]]
* 21:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 17:49 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 21:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 17:41 twentyafterfour: The train is now unblocked, promoting to group0 refs [[phab:T278343|T278343]]
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27891 and previous config saved to /var/cache/conftool/dbconfig/20220517-212316-ladsgroup.json
* 17:01 Urbanecm: Server side upload of three video files ([[phab:T278959|T278959]], [[phab:T278958|T278958]], [[phab:T278957|T278957]])
* 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27890 and previous config saved to /var/cache/conftool/dbconfig/20220517-212040-ladsgroup.json
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27889 and previous config saved to /var/cache/conftool/dbconfig/20220517-210535-ladsgroup.json
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P27888 and previous config saved to /var/cache/conftool/dbconfig/20220517-205030-ladsgroup.json
* 14:57 papaul: disconnecting ps1-d8-codfw for replacement
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1007.eqiad.wmnet
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:02 Urbanecm: Server side upload of two video files ([[phab:T278961|T278961]], [[phab:T278960|T278960]])
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 jynus: retrying s3 snapshot on codfw
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:39 akosiaris: revert mw1412, mw1413, wtp1032, mw2305 to the previous state for [[phab:T278220|T278220]]
* 20:25 cjming: end of UTC late backport & config window
* 13:34 akosiaris: disabling puppet on role::mediawiki::appserver, role::mediawiki::appserver::api, role::mediawiki::maintenance, role::mediawiki::jobrunner, role::parsoid, role::parsoid::testing [[phab:T278220|T278220]]
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:00 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters. The video transcoding backlog has been served we can return to "normal"
* 20:22 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 53s)
* 12:59 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters
* 20:21 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 52s)
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 20:20 cjming@deploy1002: Synchronized static/images/project-logos/betawikiversity-2x.png: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 53s)
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 20:19 cjming@deploy1002: Synchronized static/images/project-logos/betawikiversity-1.5x.png: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 56s)
* 11:38 awight: EU deployment complete
* 20:18 cjming@deploy1002: Synchronized static/images/project-logos/betawikiversity.png: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 54s)
* 11:38 awight@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo: Backport: [[gerrit:675882{{!}}Style change to mediasearch logged-in notice close (T274927)]] [[gerrit:675883{{!}}Suppress user notice on mobile (T274927)]] [[gerrit:675881{{!}}Reset namespace filter on cancel (T276261)]] (duration: 01m 08s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:26 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:675509{{!}}vector: Disable WVUI search widget treatment A/B test (T276917)]] (duration: 01m 08s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:48 effie: enable puppet on all mw* servers
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:10 effie: disable puppet on all mw* hosts
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792272{{!}}Deploy TOC A/B test to pilot wikis except frwiki, ptwiki (T306607)]] (duration: 00m 53s)
* 09:03 hashar: contint2001: enable puppet again
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:38 hashar: contint2001: stopping Puppet for an Apache config live hack
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:35 eileen: civicrm revision changed from {{Gerrit|7040b68c11}} to {{Gerrit|2fcea570bd}}, config revision is {{Gerrit|6779e3829a}}
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:44 bd808: Updated Toolhub to 42072d, applied db migrations, and rebuilt search indexes
* 02:22 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 02:17 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 02:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 19:29 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:28 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 02:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 19:26 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 02:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:25 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Maint
* 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 18:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1156.eqiad.wmnet with reason: Maint
* 01:15 urbanecm@deploy1002: Synchronized wmf-config/config/gawiki.yaml: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 18:26 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-tool1011.eqiad.wmnet
* 01:13 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 18:16 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 17:58 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 01:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 17:58 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-tool1011.eqiad.wmnet
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27884 and previous config saved to /var/cache/conftool/dbconfig/20220517-172632-ladsgroup.json
* 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27883 and previous config saved to /var/cache/conftool/dbconfig/20220517-172521-ladsgroup.json
* 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27882 and previous config saved to /var/cache/conftool/dbconfig/20220517-172001-ladsgroup.json
* 17:16 robh: ganeti4003 rebooting for firmware updates via [[phab:T307997|T307997]]
* 17:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 17:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27881 and previous config saved to /var/cache/conftool/dbconfig/20220517-170456-ladsgroup.json
* 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27880 and previous config saved to /var/cache/conftool/dbconfig/20220517-164951-ladsgroup.json
* 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27878 and previous config saved to /var/cache/conftool/dbconfig/20220517-163446-ladsgroup.json
* 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27877 and previous config saved to /var/cache/conftool/dbconfig/20220517-163024-ladsgroup.json
* 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Manual repool', diff saved to https://phabricator.wikimedia.org/P27876 and previous config saved to /var/cache/conftool/dbconfig/20220517-162835-ladsgroup.json
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27875 and previous config saved to /var/cache/conftool/dbconfig/20220517-162738-ladsgroup.json
* 16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27874 and previous config saved to /var/cache/conftool/dbconfig/20220517-154502-ladsgroup.json
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27873 and previous config saved to /var/cache/conftool/dbconfig/20220517-154310-ladsgroup.json
* 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27872 and previous config saved to /var/cache/conftool/dbconfig/20220517-153921-ladsgroup.json
* 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27871 and previous config saved to /var/cache/conftool/dbconfig/20220517-152416-ladsgroup.json
* 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27870 and previous config saved to /var/cache/conftool/dbconfig/20220517-150911-ladsgroup.json
* 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27869 and previous config saved to /var/cache/conftool/dbconfig/20220517-145406-ladsgroup.json
* 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27868 and previous config saved to /var/cache/conftool/dbconfig/20220517-144959-ladsgroup.json
* 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27867 and previous config saved to /var/cache/conftool/dbconfig/20220517-144946-ladsgroup.json
* 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27865 and previous config saved to /var/cache/conftool/dbconfig/20220517-143916-ladsgroup.json
* 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27864 and previous config saved to /var/cache/conftool/dbconfig/20220517-143441-ladsgroup.json
* 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27863 and previous config saved to /var/cache/conftool/dbconfig/20220517-142411-ladsgroup.json
* 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27862 and previous config saved to /var/cache/conftool/dbconfig/20220517-141936-ladsgroup.json
* 14:19 hnowlan@deploy1002: Finished deploy [restbase/deploy@6e39559]: Add kcgwiki - [[phab:T305281|T305281]] (duration: 119m 34s)
* 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27861 and previous config saved to /var/cache/conftool/dbconfig/20220517-140906-ladsgroup.json
* 14:08 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
* 14:08 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
* 14:06 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 14:05 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27860 and previous config saved to /var/cache/conftool/dbconfig/20220517-140431-ladsgroup.json
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27859 and previous config saved to /var/cache/conftool/dbconfig/20220517-140016-ladsgroup.json
* 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27858 and previous config saved to /var/cache/conftool/dbconfig/20220517-140008-ladsgroup.json
* 13:55 tgr@deploy1002: Finished scap: Backport with i18n changes: [[gerrit:792478{{!}}Account creation: add Thank you banner texts]] (duration: 14m 57s)
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27857 and previous config saved to /var/cache/conftool/dbconfig/20220517-135401-ladsgroup.json
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27856 and previous config saved to /var/cache/conftool/dbconfig/20220517-135006-ladsgroup.json
* 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 13:50 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27855 and previous config saved to /var/cache/conftool/dbconfig/20220517-134838-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27854 and previous config saved to /var/cache/conftool/dbconfig/20220517-134503-ladsgroup.json
* 13:40 tgr@deploy1002: Started scap: Backport with i18n changes: [[gerrit:792478{{!}}Account creation: add Thank you banner texts]]
* 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27853 and previous config saved to /var/cache/conftool/dbconfig/20220517-133333-ladsgroup.json
* 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27852 and previous config saved to /var/cache/conftool/dbconfig/20220517-132958-ladsgroup.json
* 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27851 and previous config saved to /var/cache/conftool/dbconfig/20220517-131827-ladsgroup.json
* 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27850 and previous config saved to /var/cache/conftool/dbconfig/20220517-131453-ladsgroup.json
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27849 and previous config saved to /var/cache/conftool/dbconfig/20220517-131040-ladsgroup.json
* 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27848 and previous config saved to /var/cache/conftool/dbconfig/20220517-131032-ladsgroup.json
* 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27846 and previous config saved to /var/cache/conftool/dbconfig/20220517-130322-ladsgroup.json
* 13:02 Amir1: killed cawiki's refreshLinkRecommendations.php ([[phab:T299021|T299021]])
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27845 and previous config saved to /var/cache/conftool/dbconfig/20220517-125713-ladsgroup.json
* 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27844 and previous config saved to /var/cache/conftool/dbconfig/20220517-125527-ladsgroup.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P27843 and previous config saved to /var/cache/conftool/dbconfig/20220517-124227-ladsgroup.json
* 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27842 and previous config saved to /var/cache/conftool/dbconfig/20220517-124022-ladsgroup.json
* 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27841 and previous config saved to /var/cache/conftool/dbconfig/20220517-122517-ladsgroup.json
* 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27840 and previous config saved to /var/cache/conftool/dbconfig/20220517-122201-ladsgroup.json
* 12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 12:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 12:20 hnowlan@deploy1002: Started deploy [restbase/deploy@6e39559]: Add kcgwiki - [[phab:T305281|T305281]]
* 12:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 12:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 12:04 moritzm: draining ganeti4003 [[phab:T307997|T307997]]
* 11:53 moritzm: failover Ganeti master in ulsfo to ganeti4001 [[phab:T307997|T307997]]
* 10:32 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4002.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 10:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4002.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After depooling', diff saved to https://phabricator.wikimedia.org/P27838 and previous config saved to /var/cache/conftool/dbconfig/20220517-100223-root.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After depooling', diff saved to https://phabricator.wikimedia.org/P27837 and previous config saved to /var/cache/conftool/dbconfig/20220517-094719-root.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After depooling', diff saved to https://phabricator.wikimedia.org/P27836 and previous config saved to /var/cache/conftool/dbconfig/20220517-093216-root.json
* 09:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4002.ulsfo.wmnet with OS bullseye
* 09:20 XioNoX: all switches, split configuration per interfaces (use new get_junos_interfaces function)
* 09:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After depooling', diff saved to https://phabricator.wikimedia.org/P27835 and previous config saved to /var/cache/conftool/dbconfig/20220517-091712-root.json
* 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:16 btullis@deploy1002: Finished deploy [analytics/turnilo/deploy@bf60521]: (no justification provided) (duration: 00m 03s)
* 09:16 btullis@deploy1002: Started deploy [analytics/turnilo/deploy@bf60521]: (no justification provided)
* 09:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:09 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4002.ulsfo.wmnet with reason: host reimage
* 09:05 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4002.ulsfo.wmnet with reason: host reimage
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After depooling', diff saved to https://phabricator.wikimedia.org/P27834 and previous config saved to /var/cache/conftool/dbconfig/20220517-090208-root.json
* 08:59 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/specials/pagers/ContribsPager.php: Backport: [[gerrit:792474{{!}}ContribsPager: Update index hint to use revision table in READ NEW (T307295)]] (duration: 00m 53s)
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:54 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.12/includes/specials/pagers/ContribsPager.php: Backport: [[gerrit:792475{{!}}ContribsPager: Update index hint to use revision table in READ NEW (T307295)]] (duration: 00m 56s)
* 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4002.ulsfo.wmnet with OS bullseye
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 5%: After depooling', diff saved to https://phabricator.wikimedia.org/P27833 and previous config saved to /var/cache/conftool/dbconfig/20220517-084704-root.json
* 08:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:40 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792565{{!}}Turn on read new for templatelinks on frwiki (T306673)]] (duration: 02m 25s)
* 08:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:21 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b569ee8]: Update DAG spark conf [airflow-dags/analytics@b569ee8] (duration: 00m 07s)
* 08:21 aqu@deploy1002: Started deploy [airflow-dags/analytics@b569ee8]: Update DAG spark conf [airflow-dags/analytics@b569ee8]
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:08 moritzm: installing ffmpeg security updates on stretch
* 08:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:06 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:53 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]] (duration: 14m 35s)
* 07:39 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 07:36 kart_: UTC morning backport window - Done.
* 07:36 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791481{{!}}Enable Section Translation in bcl, is, ne, pa, ts and ur Wikipedias (T304828)]] (duration: 00m 53s)
* 07:35 jnuche@deploy1002: stage-train aborted:  (duration: 25m 33s)
* 07:35 jnuche@deploy1002: deploy-promote aborted:  (duration: 14m 44s)
* 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:22 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 07:20 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791315{{!}}Deploy template search improvements to enwiki (T303802)]] (duration: 02m 11s)
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:17 XioNoX: core routers, split configuration per interfaces (use new get_junos_interfaces function)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791314{{!}}Deploy VE template dialog improvements to enwiki (T306967)]] (duration: 00m 50s)
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 XioNoX: management routers, split configuration per interfaces (use new get_junos_interfaces function)
* 06:37 XioNoX: management switches, split configuration per interfaces (use new get_junos_interfaces function)
* 05:44 _joe_: restarted rsyslog on kubernetes2022
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-03-30 ==
== 2022-05-16 ==
* 23:59 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 22:14 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: exim debugging
* 23:56 legoktm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 22:14 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: exim debugging
* 23:53 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 21:47 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:29 Amir1: sudo django-admin hyperkitty_import -l discovery-alerts@lists-next.wikimedia.org discovery-alerts.mbox/discovery-alerts.mbox --pythonpath /usr/share/mailman3-web --settings settings ([[phab:T278609|T278609]])
* 21:47 robh: ganeti4002 rebooting for firmware update via [[phab:T307997|T307997]]
* 23:27 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 23:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:31 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef306a35464f295f43b874301cf0170edcfa4d8c}}: Growth features: bnwiki: Enable impact module ([[phab:T274793|T274793]]) (duration: 01m 07s)
* 21:26 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 22:52 cstone: civicrm revision changed from {{Gerrit|ad430721f6}} to {{Gerrit|7040b68c11}}
* 21:14 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:11 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: rollback (duration: 00m 12s)
* 21:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 21:11 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: rollback
* 21:07 cstone: civicrm revision changed from {{Gerrit|6d85f1cc}} to {{Gerrit|d45afdfc}}
* 21:05 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: trying again with newly built zip (duration: 00m 12s)
* 21:05 mutante: gerrit2002 (in setup) - rebooting
* 21:05 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: trying again with newly built zip
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:02 legoktm: scap pulling on mw1298
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:59 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 15s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:58 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:58 legoktm: killed remaining ffmpeg on mw1298
* 20:41 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792141{{!}}Revert "cirrus: Turn on AB test of wbsearchentities profiles" (T306644)]] (duration: 00m 51s)
* 20:56 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 12s)
* 20:36 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792197{{!}}yiwiktionary: Add localized mobile wordmark (T308411)]] and [[gerrit:792196{{!}}hewiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 50s)
* 20:56 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:34 catrope@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-yi.svg: Config: [[gerrit:792197{{!}}yiwiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 49s)
* 20:53 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:33 catrope@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-he.svg: Config: [[gerrit:792196{{!}}hewiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 50s)
* 20:52 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:31 catrope@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:792192{{!}}yiwiktionary: Update desktop logo (T308411)]] (duration: 00m 51s)
* 20:41 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 20s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:29 catrope@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:792192{{!}}yiwiktionary: Update desktop logo (T308411)]] (duration: 00m 51s)
* 20:41 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:40 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:37 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:20 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791725{{!}}thwikibooks: Enable import (T308374)]] (duration: 00m 51s)
* 20:37 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 31s)
* 20:14 catrope@deploy1002: Synchronized wmf-config: Config: [[gerrit:792149{{!}}GrowthExperiments: Update campaigns benefit list config (T305659)]] (duration: 00m 51s)
* 20:36 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 05s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:35 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:34 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 twentyafterfour@deploy1002: Started restart [releng/phatality@715d809]: (no justification provided)
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:33 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]] (duration: 80m 32s)
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:29 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 49s)
* 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:29 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1307.eqiad.wmnet
* 18:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryBacklinksprop.php: Backport: [[gerrit:792140{{!}}ApiQueryBacklinksprop: Make sure the index setting exists (T306673)]] (duration: 00m 50s)
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1306.eqiad.wmnet
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1305.eqiad.wmnet
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1304.eqiad.wmnet
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1303.eqiad.wmnet
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1307.eqiad.wmnet
* 17:25 mutante: ACKIng again all unhandled CRIT alerts on hosts with "dev" in their name - (imho dev hosts should not have prod CRIT alerts?)
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1306.eqiad.wmnet
* 15:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2001.wikimedia.org
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1305.eqiad.wmnet
* 15:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1304.eqiad.wmnet
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1303.eqiad.wmnet
* 15:50 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 20:26 twentyafterfour: preparing to deploy phatality upgrade to kibana cluster
* 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1296.eqiad.wmnet
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1298.eqiad.wmnet
* 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1299.eqiad.wmnet
* 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox-dev2001.wikimedia.org
* 20:21 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a] (duration: 04m 29s)
* 15:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:792229{{!}} Bumping portals to master (T128546)]] (duration: 00m 51s)
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1299.eqiad.wmnet
* 15:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:792229{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1298.eqiad.wmnet
* 15:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netbox2001-dev.wikimedia.org
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1296.eqiad.wmnet
* 15:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a]
* 15:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 20:16 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a] (duration: 00m 07s)
* 15:39 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox2001-dev.wikimedia.org
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a]
* 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a] (duration: 17m 11s)
* 15:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update homer wmf-netbox plugin - ayounsi@cumin1001
* 20:02 twentyafterfour: when syncing 1.36.0-wmf.37 promote to testwikis, one server failed: server mw1298.eqiad.wmnet and two more appear to be hung because scap is stuck at 2 left 99% without making any progress for a long time now. refs [[phab:T278343|T278343]]
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:58 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:58 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a]
* 15:22 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update homer wmf-netbox plugin - ayounsi@cumin1001
* 19:58 bblack: repool cp1087 - [[phab:T278729|T278729]]
* 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 15:18 papaul: rebooting pfw3[a-b]-eqiad for Junos upgrade
* 18:15 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryBacklinksprop.php: Backport: Revert: [[gerrit:792136{{!}}ApiQueryBacklinksprop: Force the correct templatelinks index on read new (T306673)]] (duration: 00m 50s)
* 18:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:47 ladsgroup@deploy1002: scap failed: average error rate on 3/8 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 17:22 legoktm: moved mw[1293-1295] to jobrunners and mw[1300-1302] to videoscalers
* 14:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1302.eqiad.wmnet
* 14:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1301.eqiad.wmnet
* 14:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1300.eqiad.wmnet
* 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1302.eqiad.wmnet
* 14:42 XioNoX: fix MTUs on asw-c-codfw
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1301.eqiad.wmnet
* 14:14 godog: bump disk space in prometheus codfw k8s-ml-serve  (+30G)
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1300.eqiad.wmnet
* 14:14 Lucas_WMDE: UTC afternoon backport+config window done (just for the record; actual last backport was half an hour ago)
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1295.eqiad.wmnet
* 13:54 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1294.eqiad.wmnet
* 13:52 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1293.eqiad.wmnet
* 13:50 XioNoX: fix MTUs on asw-b-codfw
* 17:19 legoktm: killed all ffmpeg on mw1294
* 13:47 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1295.eqiad.wmnet
* 13:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1293.eqiad.wmnet
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1294.eqiad.wmnet
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:13 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:12 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:41 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 17:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:38 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791724{{!}}thwikibooks: set wgRestrictDisplayTitle to false (T308375)]] (duration: 00m 50s)
* 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:40 effie: enable puppet on mw* hosts
* 13:29 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript updateArticleCount.php thwikibooks --update # [[phab:T308376|T308376]] [basically instantaneous, 1558 articles]
* 16:10 mutante: mw1296 - started ferm
* 13:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791722{{!}}thwikibooks: Add NS 104 and 106 to wgContentNamespaces (T308376)]] (duration: 00m 53s)
* 16:10 mutante: mw1308 - started ferm
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:07 akosiaris: split jobrunners/videoscalers clusters in conftool. mw12* become videoscalers, mw13* become jobrunners, killing ffmpeg on mw13*
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:07 mutante: mw1309 - systemctl start ferm
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:07 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=jobrunner,name=mw12.*
* 13:24 godog: free up space on thanos-be2001 on /var/log/spool/rsyslog
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw13.*
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791717{{!}}thwikibooks: Enable babel categorize (T308378)]] (duration: 00m 52s)
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:59 akosiaris: depool a number of hosts from videoscalers
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet,service=jobrunner
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet,service=jobrunner
* 12:43 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 12:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:29 hnowlan: moving all test tables out of cassandra directories on aqs hosts
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:59 effie: disable puppet on mediawiki servers to deploy 663565
* 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:58 Urbanecm: Move Help talk:Help talk:Getting started --> Help talk:Getting started via moveBatch.php on enwiki ([[phab:T278350|T278350]])
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:32 arturo: manually start update-openstack-mirror.service on sodium ([[phab:T278505|T278505]])
* 12:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:02 jbond42: rollout lxml update [[phab:T278822|T278822]]
* 12:21 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 49s)
* 12:55 jbond42: update spamassasin on lists,otrs and mx [[phab:T278820|T278820]]
* 12:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 48s)
* 12:39 Amir1: ssh -p 29418 gerrit.wikimedia.org replication start wikidata/query-builder --wait ([[phab:T277060|T277060]])
* 12:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 12:38 jbond42: update python(3)-pygments
* 12:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:14 Urbanecm: mwmaint1002: Downloading multiple big files (total filesize estimated 150 GB, downloaded and processed in batches) for server-side uploads
* 12:13 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating kcgwiki ([[phab:T305279|T305279]])
* 11:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:675751{{!}}Disable legacy javascript global variables in group1]], Some increase in client errors is expected ([[phab:T72470|T72470]]) (duration: 01m 11s)
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
* 12:11 urbanecm@deploy1002: Synchronized dblists: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 50s)
* 09:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:41 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 12:10 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: [[phab:T308267|T308267]]
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: [[phab:T308267|T308267]]
* 09:05 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
* 09:04 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
* 08:36 jynus: mariadb upgrade of all buster source backup hosts to 10.4.18 [[phab:T250666|T250666]]
* 11:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
* 08:05 dcausse: refreshing wdqs entities ([[phab:T278693|T278693]])
* 11:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
* 07:37 elukey: restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - [[phab:T278734|T278734]]
* 11:26 XioNoX: asw2-ulsfo fix MTU on 2 interfaces
* 07:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36 - [[phab:T274940|T274940]]
* 11:09 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes: Backport: [[gerrit:792126{{!}}RestrictionStore: Add support for templatelinks migration (T308207)]] (duration: 00m 54s)
* 06:06 elukey: powercycle cp1087 (no ssh, no mgmt console tty)
* 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:04 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:57 vgutierrez: test HAProxy 2.4.17 on cp4026 and cp4032
* 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:58 urbanecm: UTC morning B&C window done
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e9a00e8}}: GrowthExperiments: Update campaigns configuration ([[phab:T305443|T305443]], [[phab:T305659|T305659]], [[phab:T307521|T307521]]) (duration: 00m 50s)
* 07:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dc82dfa8}}: ptwikinews: Enable extension MediaSearch ([[phab:T299872|T299872]]) (duration: 00m 48s)
* 07:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|57d4a9c}}: thwikibooks: Enable quiz extension ([[phab:T308377|T308377]]) (duration: 00m 48s)
* 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e04f86}}: thwikibooks: Add more namespaces to wgNamespacesToBeSearchedDefault ([[phab:T308373|T308373]]) (duration: 00m 48s)
* 07:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:36 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|67ce6ce}}: zhwikisource: Add NS100 to wgNamespacesToBeSearchedDefault ([[phab:T308393|T308393]]) (duration: 00m 50s)
* 07:18 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)


== 2021-03-29 ==
== 2022-05-15 ==
* 19:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 21:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 07s)
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:46 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 07s)
* 16:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 21:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 16:11 hnowlan: depooled aqs1004 for transfer of large tables to aqs1010
* 21:39 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:39 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 15:47 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 21:30 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 15:45 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:30 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 15:39 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 21:14 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 13:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 21:14 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 13:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 13:03 ema: cp4027: rollback luajit experiment https://github.com/apache/trafficserver/issues/7423#issuecomment-809354214
* 12:36 ema: cp4027: re-enable JIT compilation in all ats-be lua scripts -- https://github.com/apache/trafficserver/issues/7423
* 11:57 ema: cp4027: re-enable JIT compilation in normalize-path.lua -- https://github.com/apache/trafficserver/issues/7423
* 11:32 ema: cp4027: install libluajit 2.1.0~beta3+dfsg-6wm1 with P15083 applied -- https://github.com/apache/trafficserver/issues/7423
* 09:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 09:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 09:16 ryankemper: [[phab:T267927|T267927]] `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id [[phab:T267927|T267927]] --reload-data wikidata --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool`
* 09:15 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 08:47 filippo@deploy1002: Finished deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}} (duration: 00m 08s)
* 08:47 filippo@deploy1002: Started deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}}
* 07:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 06s)
* 07:58 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 07:54 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition - [[phab:T278478|T278478]] (duration: 01m 08s)
* 07:49 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: [[gerrit:675161{{!}}Wrap most of functionalities depending on protect mode in a condition]] ([[phab:T278478|T278478]]) (duration: 01m 08s)
* 07:42 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]


== 2021-03-27 ==
== 2022-05-14 ==
* 19:25 elukey: powercycle elastic1060 - [[phab:T278630|T278630]]
* 08:34 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P27830 and previous config saved to /var/cache/conftool/dbconfig/20220514-083421-jynus.json
* 06:10 ryankemper: [[phab:T267927|T267927]] `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_dumps_2020-03-26`
* 00:53 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Server need to be downgraded to stretch, on monday
* 05:44 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Server need to be downgraded to stretch, on monday
* 05:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 05:42 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 05:42 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 05:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 05:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload


== 2021-03-26 ==
== 2022-05-13 ==
* 22:27 tzatziki: reset password for Philroc
* 23:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1007.eqiad.wmnet with reason: Upgrade turnilo
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 23:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1007.eqiad.wmnet with reason: Upgrade turnilo
* 20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 23:14 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@bf60521]: Staging deployment of turnilo 1.35 (duration: 00m 08s)
* 17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - [[phab:T277795|T277795]] (duration: 01m 06s)
* 23:13 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@bf60521]: Staging deployment of turnilo 1.35
* 17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]] (duration: 31m 43s)
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1003.wikimedia.org
* 17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]]
* 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1003.wikimedia.org
* 15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
* 17:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1004.wikimedia.org
* 15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
* 17:24 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
* 17:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudservices1004.wikimedia.org
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
* 17:24 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
* 15:57 _joe_: uploading conftool 2.2.0 to buster, bullseye [[phab:T305824|T305824]] [[phab:T305582|T305582]] [[phab:T305607|T305607]] [[phab:T305638|T305638]] [[phab:T307905|T307905]] [[phab:T308100|T308100]]
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
* 12:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 12:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 12:37 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 13:02 moritzm: reimaging theemin [[phab:T275873|T275873]]
* 12:37 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 12:56 moritzm: drain ganeti1014
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2140 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P27824 and previous config saved to /var/cache/conftool/dbconfig/20220513-121832-marostegui.json
* 12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 12:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 11:59 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 12:37 moritzm: drain ganeti1013
* 11:57 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 11:47 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 11:40 moritzm: installing idp-test1002 [[phab:T308214|T308214]]
* 10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` ([[phab:T278350|T278350]])
* 10:55 moritzm: installing idp-test2002 [[phab:T308214|T308214]]
* 10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` to fix an UBN task ([[phab:T278350|T278350]])
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
* 10:18 vgutierrez: disable puppet on gerrit1001 to fix /etc/ssh/ssh_config
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
* 08:39 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}} (duration: 00m 08s)
* 08:03 jynus: moving s2 database from db2101 to db2097 [[phab:T299920|T299920]]
* 09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}}
* 07:59 moritzm: draining ganeti4002 [[phab:T307997|T307997]]
* 09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
* 07:52 XioNoX: add init7 transit in drmrs
* 09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
* 07:39 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4001.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) [[phab:T224586|T224586]]
* 07:39 root@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4001.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
* 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
* 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
* 07:18 Amir1: start of mwscript extensions/Echo/maintenance/removeOrphanedEvents.php --wiki=wikidatawiki --force ([[phab:T308084|T308084]])
* 09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}} (duration: 00m 12s)
* 02:14 ejegg: updated payments-wiki from {{Gerrit|8f46af9d}} to {{Gerrit|590fac28}}
* 09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}}
* 09:28 moritzm: drain ganeti1012
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
* 08:38 moritzm: drain ganeti1010
* 08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
* 08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
* 06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
* 05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
* 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
* 04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`


== 2021-03-25 ==
== 2022-05-12 ==
* 23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
* 21:56 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@a2bdc3e]: (no justification provided) (duration: 02m 08s)
* 23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
* 21:53 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@a2bdc3e]: (no justification provided)
* 23:20 jhuneidi@deploy1002: Synchronized README: [[gerrit:674984{{!}}DEMO: README]] (duration: 01m 07s)
* 21:43 robh: cp306[23] returned to service, cp306[45] coming down for firmware update via [[phab:T243167|T243167]]
* 22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
* 21:15 robh: cp306[01] returned to service, cp306[23] coming down for firmware update via [[phab:T243167|T243167]]
* 22:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 20:59 brennen: utc late backport & config window closed
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:50 robh: resuming last 6 esams cp host firmware updates via [[phab:T243167|T243167]].  cp306[01] going offline
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:50 Krinkle: krinkle@mwmaint1002$ mwscript refreshLinks.php --wiki commonswiki --category 'Media_needing_categories_requiring_human_attention' (approximately 2000 tiny pages)
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:27 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:48 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 and 2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 20:39 brennen@deploy1002: Finished scap: Backport for [[gerrit:791430]] viwiki: Enable "upload_by_url" for sysop (duration: 01m 36s)
* 19:36 hashar@deploy1002: sync-wikiversions aborted: (no justification provided) (duration: 00m 03s)
* 20:37 brennen@deploy1002: Started scap: Backport for [[gerrit:791430]] viwiki: Enable "upload_by_url" for sysop
* 19:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 08s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 06s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:59 Urbanecm: `mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' ` finished ([[phab:T275337|T275337]])
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:53 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # [[phab:T278391|T278391]]
* 20:32 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791424{{!}}ruwiktionary: Add localized mobile wordmark (T308233)]] (duration: 00m 50s)
* 18:50 Urbanecm: mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' # [[phab:T275337|T275337]]
* 20:31 brennen@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-ru.svg: Config: [[gerrit:791424{{!}}ruwiktionary: Add localized mobile wordmark (T308233)]] (duration: 00m 49s)
* 18:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|39cd4f15a3900783ac0e9a213004a28f18298a23}}: ruwiki: flaggedrevs: Do not allow sysops to modify users in autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 09s)
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:785229]] Enable "upload_by_url" feature on zhwiki (duration: 01m 46s)
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcfb7feaace1f397169e5e1bab7efd4e5f605a0f}}: ruwiki: flaggedrevs: Do not remove autoreview group ([[phab:T275337|T275337]]) (duration: 01m 14s)
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|3fb664682bea3c4d1448b0937f938e810268bac3}}: ruwiki: flaggedrevs: Revoke review from sysop group ([[phab:T275811|T275811]]) (duration: 01m 06s)
* 20:23 brennen@deploy1002: Started scap: Backport for [[gerrit:785229]] Enable "upload_by_url" feature on zhwiki
* 18:29 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (3/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (2/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:26 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (1/3; [[phab:T275819|T275819]]) (duration: 01m 10s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62be4e738a4fd45256027bb09b010ab152f19850}}: Disable magic links on enwiki ([[phab:T275951|T275951]]) (duration: 01m 20s)
* 20:17 brennen@deploy1002: backport aborted: (duration: 02m 05s)
* 18:14 mutante: alert1001 - sudo systemctl restart tcpircbot-logmsgbot
* 20:17 brennen@deploy1002: prep aborted: (duration: 00m 01s)
* 18:09 marxarelli: scap sync-file .pipeline Config: [[gerrit:674132{{!}}Include patches in restricted image (T271274)]]
* 19:57 hashar: Restarting Gerrit
* 18:06 hnowlan: draining and restarting aqs1004-b cassandra
* 19:53 mutante: gitlab2001 - systemctl start backup-restore -  systemd[1]: Started GitLab Backup Restore. after gerrit:791410  for [[phab:T308089|T308089]]
* 17:45 hnowlan: draining and restarting aqs1004-a cassandra
* 18:57 jelto: restart gitlab2001
* 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:14 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:26 krinkle@deploy1002: Synchronized w/static.php: {{Gerrit|Ic0a5eae4f721a16403071d1b2136cf23d78e4fa9}} (duration: 00m 49s)
* 17:08 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 16:39 hashar: Restarted Apache 2 on contint2001 / contint1001
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 18:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:32 moritzm: restarting apache on an-tool1007/turnilo
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:27 moritzm: restarting dnsdist/rdns-recursor on malmok
* 18:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage
* 16:24 jbond42: restart slapd on ldap-replica
* 18:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage
* 16:22 jbond42: restart slapd on ldap-corp
* 17:52 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:20 jbond42: restart apache on lists1002
* 17:51 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 16:18 jbond42: restart apache on netbox
* 17:50 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 16:13 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Disallow negative or decimal values in pages tag - [[phab:T278400|T278400]] (duration: 01m 32s)
* 17:50 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided) (duration: 00m 08s)
* 16:12 jbond42: restart routinator on rpki*
* 17:50 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided)
* 16:12 moritzm: restarting nginx on apt*
* 17:50 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided) (duration: 29m 32s)
* 16:10 moritzm: restarting apache on dbmonitor
* 17:50 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 16:08 moritzm: restart Apacge on matomo/piwik
* 17:47 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 16:03 jbond42: restart apache service on gerrit
* 17:46 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 16:02 jbond42: restart idp service
* 17:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 16:01 ema: A:cp rolling ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki>-restart for openssl upgrades -- https://www.openssl.org/news/secadv/20210325.txt
* 17:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 15:45 moritzm: installing openssl updates on buster
* 17:43 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:48 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1006.eqiad.wmnet with OS buster
* 14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
* 17:26 jmm@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 14:13 twentyafterfour: update phabricator again (last night's update undid a hotfix that is now fixed properly)
* 17:21 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided)
* 13:45 moritzm: drain ganeti1009
* 17:08 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 17:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage
* 13:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 16:57 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage
* 13:27 moritzm: reduce webperf1001/webperf2001 to 4G RAM (xhgui has been split off to separate VMs)
* 16:53 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade
* 13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
* 16:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade
* 13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
* 16:35 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1006.eqiad.wmnet with OS buster
* 12:52 hnowlan: aqs1004 nodetool-a cleanup finished
* 16:21 mutante: gitlab2001 - trying to stop 'puma' for debugging [[phab:T308089|T308089]]
* 12:14 moritzm: drain ganeti1008
* 16:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
* 16:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
* 16:06 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:52 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674861{{!}}Disable Legacy javascript in fawikiquote]] ([[phab:T72470|T72470]]) (duration: 01m 07s)
* 16:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1006.wikimedia.org
* 11:46 moritzm: drain ganeti1007
* 15:57 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host labstore1006.wikimedia.org
* 11:44 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/skins/Vector/resources: [[gerrit:674382{{!}}Inform anonymous A/B test by tracking time from navigationStart (T275807)]] (duration: 01m 09s)
* 15:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
* 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1007.wikimedia.org
* 11:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
* 15:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host labstore1005.eqiad.wmnet
* 11:33 ladsgroup@deploy1002: Synchronized dblists/: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]], Part II ([[phab:T278369|T278369]]) (duration: 01m 07s)
* 15:06 razzi@cumin1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 11:32 ladsgroup@deploy1002: Synchronized wmf-config: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]] ([[phab:T278369|T278369]]) (duration: 01m 30s)
* 15:05 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1008.eqiad.wmnet with reason: host reimage
* 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P27819 and previous config saved to /var/cache/conftool/dbconfig/20220512-145554-root.json
* 11:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 14:48 razzi@cumin1001: conftool action : set/pooled=inactive; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
* 14:48 razzi@cumin1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 14:47 razzi@cumin1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
* 14:45 moritzm: installing gnupg2 updates from Bullseye point release
* 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 14:44 razzi@cumin1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 11:10 moritzm: drain ganeti1006
* 14:43 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1008.eqiad.wmnet with OS buster
* 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P27818 and previous config saved to /var/cache/conftool/dbconfig/20220512-144050-root.json
* 10:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27817 and previous config saved to /var/cache/conftool/dbconfig/20220512-143954-root.json
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:33 razzi@cumin1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P27816 and previous config saved to /var/cache/conftool/dbconfig/20220512-142546-root.json
* 10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 14:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1009.eqiad.wmnet with OS buster
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27815 and previous config saved to /var/cache/conftool/dbconfig/20220512-142450-root.json
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P27814 and previous config saved to /var/cache/conftool/dbconfig/20220512-141042-root.json
* 10:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27813 and previous config saved to /var/cache/conftool/dbconfig/20220512-140946-root.json
* 10:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1164.eqiad.wmnet with OS bullseye
* 10:42 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 depooling: Maint', diff saved to https://phabricator.wikimedia.org/P27812 and previous config saved to /var/cache/conftool/dbconfig/20220512-135848-root.json
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 13:55 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1009.eqiad.wmnet with reason: host reimage
* 10:36 hnowlan: running general nodetool cleanup on aqs1004-a
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27811 and previous config saved to /var/cache/conftool/dbconfig/20220512-135442-root.json
* 10:35 hnowlan: running cleanup on aqs1004-a: nodetool-a cleanup "local_group_default_T_pageviews_per_project_v2" data
* 13:52 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1009.eqiad.wmnet with reason: host reimage
* 10:34 moritzm: drain ganeti1005
* 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
* 13:48 moritzm: installing ffmpeg security updates
* 10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
* 10:24 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27809 and previous config saved to /var/cache/conftool/dbconfig/20220512-133938-root.json
* 10:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:38 tgr: EU mid-day deploys done
* 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 13:37 tgr@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/AddLink/ServiceLinkRecommendationProvider.php: Backport: [[gerrit:791251{{!}}Send sections_to_exclude in the POST body (T308186)]] (duration: 00m 49s)
* 10:18 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1164.eqiad.wmnet with OS bullseye
* 09:26 moritzm: drain ganeti2024
* 13:30 tgr@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
* 13:30 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1009.eqiad.wmnet with OS buster
* 09:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
* 13:28 tgr@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 08:45 moritzm: drain ganeti2023
* 13:26 tgr@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27808 and previous config saved to /var/cache/conftool/dbconfig/20220512-132434-root.json
* 08:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 13:23 tgr@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 08:12 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2 for buster-wikimedia
* 13:21 tgr@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 08:11 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2
* 13:19 tgr@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 07:41 legoktm: upgraded lists1002 to hyperkitty 1.2.2-1+wmf1 ([[phab:T276687|T276687]])
* 13:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1007.eqiad.wmnet with OS buster
* 07:36 legoktm: uploaded hyperkitty 1.2.2-1+wmf1 to buster-wikimedia ([[phab:T276687|T276687]])
* 13:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1004.eqiad.wmnet with OS buster
* 07:35 jynus: restart db2135 [[phab:T278408|T278408]] [[phab:T273281|T273281]]
* 12:45 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1007.eqiad.wmnet with reason: host reimage
* 07:05 effie: enable puppet on all mediawiki servers
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27807 and previous config saved to /var/cache/conftool/dbconfig/20220512-124406-marostegui.json
* 06:57 XioNoX: Option 82: use-vlan-id
* 12:43 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 06:53 effie: enable puppet on jobrunners
* 12:42 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1007.eqiad.wmnet with reason: host reimage
* 06:47 effie: enable puppet on parsoid
* 12:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1004.eqiad.wmnet with reason: host reimage
* 06:40 effie: disable puppet on all mediawiki servers to merge 673061 (service proxy to listen on ::1)
* 12:38 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 06:23 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 12:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1004.eqiad.wmnet with reason: host reimage
* 05:19 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:44 legoktm: restarted exim4 on lists1002 so it listens on 0.0.0.0 instead of 127.0.0.1
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 12:30 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryInfo.php: Backport: [[gerrit:791252{{!}}ApiQueryInfo: Force PRIMARY index on templatelinks (T308207)]] (duration: 00m 50s)
* 01:10 legoktm: mailman3: added lists-next.wikimedia.org domain
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:08 legoktm: mailman3: renamed default site from "example.com" to "lists-next.wikimedia.org"
* 12:28 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 00:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
* 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
* 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2777.codfw.wmnet
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27806 and previous config saved to /var/cache/conftool/dbconfig/20220512-122707-marostegui.json
* 00:34 mutante: mw2377, mw2378 - first scap pull
* 12:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2378.codfw.wmnet
* 12:24 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2377.codfw.wmnet
* 12:20 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1007.eqiad.wmnet with OS buster
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2378.codfw.wmnet
* 12:17 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2377.codfw.wmnet
* 12:14 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1005.eqiad.wmnet with OS buster
* 00:29 legoktm: syncing facts for puppet-compiler
* 12:12 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores1004.eqiad.wmnet with OS buster
* 00:23 mutante: mw2377, mw2378 - reboot
* 12:12 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 00:14 twentyafterfour: phabricator update complete
* 12:04 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
* 00:10 twentyafterfour: deploying phabricator
* 12:00 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
* 00:05 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T23:55:35` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 11:57 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27805 and previous config saved to /var/cache/conftool/dbconfig/20220512-115445-marostegui.json
* 11:51 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
* 11:50 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
* 11:46 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
* 11:43 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1005.eqiad.wmnet with reason: host reimage
* 11:40 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1005.eqiad.wmnet with reason: host reimage
* 11:21 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS bullseye
* 11:17 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1005.eqiad.wmnet with OS buster
* 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test1002.wikimedia.org
* 10:55 jmm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27804 and previous config saved to /var/cache/conftool/dbconfig/20220512-105432-marostegui.json
* 10:50 jmm@cumin1001: START - Cookbook sre.dns.netbox
* 10:50 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host idp-test1002.wikimedia.org
* 10:46 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2002.wikimedia.org
* 10:45 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS bullseye
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27803 and previous config saved to /var/cache/conftool/dbconfig/20220512-103333-marostegui.json
* 10:19 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS bullseye
* 10:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS bullseye
* 10:11 moritzm: installing Apache 2.4.53 updates on bullseye
* 09:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1002.eqiad.wmnet with OS buster
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27802 and previous config saved to /var/cache/conftool/dbconfig/20220512-094642-marostegui.json
* 09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1003.eqiad.wmnet with OS buster
* 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1002.eqiad.wmnet with reason: host reimage
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27800 and previous config saved to /var/cache/conftool/dbconfig/20220512-091706-marostegui.json
* 09:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1002.eqiad.wmnet with reason: host reimage
* 09:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1003.eqiad.wmnet with reason: host reimage
* 09:03 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1003.eqiad.wmnet with reason: host reimage
* 08:52 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores1002.eqiad.wmnet with OS buster
* 08:45 jmm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1003.eqiad.wmnet with OS buster
* 08:32 jmm@cumin1001: START - Cookbook sre.dns.netbox
* 08:31 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host idp-test2002.wikimedia.org
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27799 and previous config saved to /var/cache/conftool/dbconfig/20220512-081814-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27798 and previous config saved to /var/cache/conftool/dbconfig/20220512-075703-marostegui.json
* 07:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1001.eqiad.wmnet with OS buster
* 07:34 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 07:33 marostegui: dbmaint s7@codfw [[phab:T308206|T308206]]
* 07:32 marostegui: dbmaint s6@eqiad [[phab:T308206|T308206]]
* 07:32 marostegui: dbmaint s6@codfw [[phab:T308206|T308206]]
* 07:29 marostegui: dbmaint s3@codfw [[phab:T308206|T308206]]
* 07:29 marostegui: dbmaint s3@eqiad [[phab:T308206|T308206]]
* 07:18 marostegui: dbmaint s7@codfw [[phab:T308206|T308206]]
* 07:16 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1001.eqiad.wmnet with reason: host reimage
* 07:08 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791107{{!}}Enable Section Translation in cs, el, he, ko, sw and tr WPs (T304855 T304854 T298239 T304863 T304853 T304828)]] (duration: 00m 51s)
* 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1001.eqiad.wmnet with reason: host reimage
* 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:44 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores1001.eqiad.wmnet with OS buster
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27797 and previous config saved to /var/cache/conftool/dbconfig/20220512-063217-marostegui.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27796 and previous config saved to /var/cache/conftool/dbconfig/20220512-062241-marostegui.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1127 with low weight [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27795 and previous config saved to /var/cache/conftool/dbconfig/20220512-061305-marostegui.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27794 and previous config saved to /var/cache/conftool/dbconfig/20220512-055918-marostegui.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2122 [[phab:T307501|T307501]]', diff saved to https://phabricator.wikimedia.org/P27793 and previous config saved to /var/cache/conftool/dbconfig/20220512-054138-marostegui.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 [[phab:T307501|T307501]]', diff saved to https://phabricator.wikimedia.org/P27792 and previous config saved to /var/cache/conftool/dbconfig/20220512-053444-marostegui.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 [[phab:T308202|T308202]]', diff saved to https://phabricator.wikimedia.org/P27791 and previous config saved to /var/cache/conftool/dbconfig/20220512-051106-marostegui.json
* 04:07 kart_: Updated cxserver to 2022-05-11-135122-production ([[phab:T307967|T307967]], [[phab:T306999|T306999]], [[phab:T298239|T298239]], [[phab:T304853|T304853]], [[phab:T307507|T307507]], [[phab:T308039|T308039]])
* 04:05 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:04 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:01 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:01 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 03:57 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 03:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply


== 2021-03-24 ==
== 2022-05-11 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 22:28 robh: cp305[67] returned to service and all green in icinga, cp305[89] depooling for firmware update [[phab:T243167|T243167]]
* 23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 22:00 robh: cp305[45] returned to service and all green in icinga, cp305[67] depooling for firmware update [[phab:T243167|T243167]]
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 21:34 robh: cp30[23] returned to service and all green in icinga, cp30[45] depooling for firmware update [[phab:T243167|T243167]]
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 21:34 robh: cp50[23] returned to service and all green in icinga, cp50[45] depooling for firmware update [[phab:T243167|T243167]]
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 21:33 robh: cp50[23] returned to service and all green in icinga, cp50[45] depooling for firmware update
* 23:48 mutante: generating new mcrouter certs for mw2377, mw2378
* 21:01 robh: cp305[23] going offline via [[phab:T243167|T243167]] for firmware updates (puppet agent disabled and depooled prior to reboot)
* 22:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:07 legoktm: disabled puppet on lists1002 while mailman3-web is broken
* 20:28 tgr: [[phab:T304542|T304542]] running mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php hiwiki --verbose
* 21:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:19 mutante: webperf2001 - restarted apache
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:11 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 07s)
* 20:27 cjming: end of UTC late backport & config window
* 21:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:25 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/skins/Vector/resources: Backport: [[gerrit:790443{{!}}Factor out a separate scroll observer for the TOC A/B test, which should be fired separately from the page title observer used by the sticky header and TOC (T307952 T307345)]] (duration: 00m 52s)
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:07 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GrowthExperiments: LinkRecommendation: Modify path args for calls to API - [[phab:T277865|T277865]] (duration: 01m 07s)
* 20:11 ejegg: updated payments-wiki from {{Gerrit|cc2612d6}} to {{Gerrit|8f46af9d}}
* 21:05 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Revert "Add default TemplateStyles for an Index" - [[phab:T278379|T278379]] (duration: 01m 07s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:07 ejegg: updated payments-wiki from {{Gerrit|f06e390b}} to {{Gerrit|cc2612d6}}
* 21:02 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GlobalUsage: Fix hook registration after class was namespaced - [[phab:T278375|T278375]] (duration: 01m 07s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 hashar@deploy1002: Synchronized wmf-config/env.php: multiversion: Move '@' operator in env.php closer to relevant statement (duration: 01m 07s)
* 20:05 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790395{{!}}Release DiscussionTools new topic tool to former a/b test wikis (T307410)]] (duration: 00m 54s)
* 20:56 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 19:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:19 rzl: Added new `scap` identity to keyholder on deploy[1002,2002] - [[phab:T307351|T307351]]
* 20:10 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 18:06 razzi: razzi@lvs1020:~$ systemctl stop pybal.service to apply change https://gerrit.wikimedia.org/r/c/operations/puppet/+/779915
* 20:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:53 robh: firmware upgrade for ganeti4001 complete [[phab:T307997|T307997]] (bios, nics, idrac) and manually confirmed first 10G port is link active (it is) and is set to pxe
* 20:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 15:50 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4001.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 15:49 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4001.mgmt.ulsfo.wmnet with reboot policy FORCED
* 19:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:46 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@378e7ca]: (no justification provided) (duration: 00m 03s)
* 19:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:46 ebysans@deploy1002: Started deploy [airflow-dags/analytics@378e7ca]: (no justification provided)
* 19:57 ryankemper: [[phab:T267927|T267927]] Host key is missing for `wdqs2008` leading to `data-transfer` cookbook failing, looking into resolving
* 15:25 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@378e7ca]: (no justification provided) (duration: 00m 08s)
* 19:55 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:25 ebysans@deploy1002: Started deploy [airflow-dags/analytics@378e7ca]: (no justification provided)
* 19:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:15 robh: ganeti4001 updating all firmware revisions [[phab:T307997|T307997]]\
* 19:50 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:15 robh: ganeti4001 updating all firmware revisions
* 19:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27789 and previous config saved to /var/cache/conftool/dbconfig/20220511-150038-marostegui.json
* 19:49 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:00 vgutierrez: pool ats-be on cp4032
* 19:49 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 14:58 moritzm: installing qemu security updates on bullseye
* 19:45 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 14:51 vgutierrez: depool ats-be on cp4032
* 19:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2008.codfw.wmnet with OS buster
* 19:42 ryankemper: [[phab:T267927|T267927]] Re-enabledpuppet on `wdqs2008` and ran puppet agent
* 14:22 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 19:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 14:08 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 19:14 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 to 1.36.0-wmf.35
* 13:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2008.codfw.wmnet with reason: host reimage
* 19:07 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 21s)
* 13:55 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2008.codfw.wmnet with reason: host reimage
* 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 13:54 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 19:03 urbanecm@deploy1002: Synchronized wmf-config/config/shwiki.yaml: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 3/3) (duration: 01m 08s)
* 13:30 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2008.codfw.wmnet with OS buster
* 19:02 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 2/3) (duration: 01m 06s)
* 13:25 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2007.codfw.wmnet with OS buster
* 19:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 1/3) (duration: 01m 07s)
* 13:14 awight: EU backports complete
* 18:54 urbanecm@deploy1002: Synchronized wmf-config/config/eswiki.yaml: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 3/3) (duration: 01m 06s)
* 13:13 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 18:53 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 2/3) (duration: 01m 07s)
* 13:11 awight@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: [[gerrit:790436{{!}}Fix incomplete FlaggedRevs::binaryFlagging() implementation (T307972)]] (duration: 00m 51s)
* 18:52 urbanecm@deploy1002: sync-file aborted: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode (2/3) (duration: 00m 01s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 1/3) (duration: 01m 08s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:49 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:45 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:42 legoktm@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:54 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2007.codfw.wmnet with reason: host reimage
* 18:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 12:50 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2007.codfw.wmnet with reason: host reimage
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5aa050602954a3cab0c7e0c4b10efb0f957efb59}}: Promote several Growth target wikis out of dark mode ([[phab:T277491|T277491]]; [[phab:T276830|T276830]]; [[phab:T276123|T276123]]; [[phab:T276816|T276816]]; [[phab:T275550|T275550]]; [[phab:T276450|T276450]]) (duration: 01m 08s)
* 12:45 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|333393dfe59deb0ec4d7df6dd92372a705f65b85}}: Add autopatrol to autoreviewers in en.wikibooks ([[phab:T278300|T278300]]) (duration: 01m 09s)
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27786 and previous config saved to /var/cache/conftool/dbconfig/20220511-124226-marostegui.json
* 18:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:23 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2007.codfw.wmnet with OS buster
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 12:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2055.codfw.wmnet with OS bullseye
* 17:25 effie: upgrade memcached on mc-gp* hosts
* 12:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 12:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:42 moritzm: reduce RAM for irc2001 to 2G, was originally created with 8 G [[phab:T224579|T224579]]
* 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:35 effie: enable puppet on all mediawiki + memcached hosts
* 11:56 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790997{{!}}Set dewiki to read new for templatelinks (T306673)]] (duration: 00m 49s)
* 15:20 moritzm: drain ganeti2022
* 11:39 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
* 11:29 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 11:26 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2006.codfw.wmnet with OS buster
* 14:35 moritzm: drain ganeti2021
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27782 and previous config saved to /var/cache/conftool/dbconfig/20220511-105416-marostegui.json
* 14:31 effie: disable puppet on all mediawiki servers + memcached for 674290
* 10:54 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2006.codfw.wmnet with reason: host reimage
* 14:05 moritzm: failover Ganeti master in codfw to ganeti2019
* 10:48 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2006.codfw.wmnet with reason: host reimage
* 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 10:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 13:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 10:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 13:29 moritzm: installing irc1001
* 10:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 13:15 moritzm: drain ganeti2020
* 10:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
* 10:31 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
* 10:31 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 12:28 effie: enabling puppet on mediawiki and memcached servers
* 10:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 12:10 jynus: restart dbprov200[12] [[phab:T271913|T271913]]
* 10:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15076 and previous config saved to /var/cache/conftool/dbconfig/20210324-115940-root.json
* 10:25 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 11:57 Andrew-WMDE_: EU deploys done
* 10:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2055.codfw.wmnet with reason: host reimage
* 11:53 jynus: restart dbprov100[12] [[phab:T271913|T271913]]
* 10:21 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2055.codfw.wmnet with reason: host reimage
* 11:51 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/MassMessage/: Backport: [[gerrit:674367{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 08s)
* 10:21 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2006.codfw.wmnet with OS buster
* 11:49 effie: disable puppet on all hosts running mediawiki+memcached to merge 674282
* 10:16 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 11:45 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/MassMessage/: Backport: [[gerrit:674366{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 07s)
* 10:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15075 and previous config saved to /var/cache/conftool/dbconfig/20210324-114436-root.json
* 10:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15074 and previous config saved to /var/cache/conftool/dbconfig/20210324-112932-root.json
* 10:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
* 11:22 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673326{{!}}Enable CodeMirror accessibility colors on initial wikis (T276346)]] (duration: 01m 08s)
* 10:08 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 11:15 jynus: restart serially db2097 db2098 db2099 db2100 [[phab:T271913|T271913]]
* 10:06 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
* 11:14 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673312{{!}}Enable bracket matching on group0 and wikitech (T273591)]] (duration: 01m 25s)
* 10:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15073 and previous config saved to /var/cache/conftool/dbconfig/20210324-111429-root.json
* 10:01 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1004.eqiad.wmnet
* 10:50 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1001.wikimedia.org
* 10:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 10:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:57 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 10:45 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:56 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1004.eqiad.wmnet
* 10:44 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:54 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 10:36 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host irc1001.wikimedia.org
* 09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
* 10:31 jynus: restart db1171 [[phab:T271913|T271913]]
* 09:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 10:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:41 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2002.codfw.wmnet
* 10:14 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
* 10:14 jynus: restart db1145 [[phab:T271913|T271913]]
* 09:35 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:35 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2002.codfw.wmnet
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for registry2003.codfw.wmnet
* 10:03 jynus: restart db1139 [[phab:T271913|T271913]]
* 09:34 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for registry2003.codfw.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15072 and previous config saved to /var/cache/conftool/dbconfig/20210324-095655-marostegui.json
* 09:27 jayme: systemctl reset-failed ifup@ens5.service on registry2003 - [[phab:T273026|T273026]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15071 and previous config saved to /var/cache/conftool/dbconfig/20210324-095606-root.json
* 09:27 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 09:51 jynus: restart db1116 [[phab:T271913|T271913]]
* 09:24 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2055.codfw.wmnet with OS bullseye
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15070 and previous config saved to /var/cache/conftool/dbconfig/20210324-094102-root.json
* 09:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:18 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15069 and previous config saved to /var/cache/conftool/dbconfig/20210324-092558-root.json
* 09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15068 and previous config saved to /var/cache/conftool/dbconfig/20210324-091055-root.json
* 09:07 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:29 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 09:06 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 08:16 gehel: restarting wdqs updater on all nodes for config change
* 09:06 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:05 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15066 and previous config saved to /var/cache/conftool/dbconfig/20210324-081057-root.json
* 09:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15065 and previous config saved to /var/cache/conftool/dbconfig/20210324-080725-root.json
* 08:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for schema change', diff saved to https://phabricator.wikimedia.org/P15064 and previous config saved to /var/cache/conftool/dbconfig/20210324-080223-marostegui.json
* 08:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ores2009.codfw.wmnet with OS buster
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-main
* 08:46 moritzm: logging an example as part of Simon's omboarding
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:40 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2009.codfw.wmnet with reason: host reimage
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15063 and previous config saved to /var/cache/conftool/dbconfig/20210324-075553-root.json
* 08:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2054.codfw.wmnet with OS bullseye
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15062 and previous config saved to /var/cache/conftool/dbconfig/20210324-075221-root.json
* 08:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2009.codfw.wmnet with reason: host reimage
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-main
* 08:12 marostegui: Rename revision_actor_temp on db1132 (s1) and db1114 (s8) [[phab:T307906|T307906]]
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external
* 08:04 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2054.codfw.wmnet with reason: host reimage
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero
* 08:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4004.ulsfo.wmnet with OS bullseye
* 07:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 08:00 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2054.codfw.wmnet with reason: host reimage
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json
* 07:51 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores2009.codfw.wmnet with OS buster
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json
* 07:47 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: host reimage
* 07:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
* 07:46 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2054.codfw.wmnet with OS bullseye
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json
* 07:44 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: host reimage
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json
* 07:22 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4004.ulsfo.wmnet with OS bullseye
* 07:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet
* 07:18 moritzm: drain ganeti4001 [[phab:T307997|T307997]]
* 07:10 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet
* 07:05 moritzm: updating ganeti4* to Ganeti 3.0.1-1~bpo10+1 [[phab:T307997|T307997]]
* 07:09 moritzm: installing squid security updates
* 06:40 marostegui: db2146 set global innodb_max_dirty_pages_pct = 75; [[phab:T307082|T307082]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1181 to dbctl, depooled [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json
* 06:31 Amir1: mwscript maintenance/refreshImageMetadata.php --wiki=commonswiki --force --verbose --mediatype=AUDIO --mime audio/webm ([[phab:T226311|T226311]])
* 06:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27780 and previous config saved to /var/cache/conftool/dbconfig/20220511-053418-marostegui.json
* 06:14 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2146 [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P27779 and previous config saved to /var/cache/conftool/dbconfig/20220511-051703-marostegui.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2146 [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P27778 and previous config saved to /var/cache/conftool/dbconfig/20220511-051307-marostegui.json
* 04:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 01:41 mutante: gitlab2001 - starting backup-restore service that had failed on previous automatic run
* 03:41 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 01:33 ejegg: updated payments-wiki from {{Gerrit|c5be9c5d}} to {{Gerrit|f06e390b}}
* 03:41 ryankemper: [[phab:T274204|T274204]] Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already
* 03:40 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 03:39 ryankemper: [[phab:T274204|T274204]] Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.`
* 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 02:38 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 02:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 01:59 ryankemper: [[phab:T274204|T274204]] For now I'll proceed to the reboots of `codfw`
* 01:59 ryankemper: [[phab:T274204|T274204]] `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack
* 01:58 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97)
* 01:49 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 01:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade-reboot
* 01:36 eileen: civicrm revision changed from {{Gerrit|f36a0b08f0}} to {{Gerrit|ad430721f6}}, config revision is {{Gerrit|26b02db7ba}}
* 00:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
* 00:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE


== 2021-03-23 ==
== 2022-05-10 ==
* 22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 20:13 mforns@deploy1002: Finished deploy [analytics/refinery@d2dfced] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2dfced] (duration: 06m 59s)
* 22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1003.wikimedia.org
* 22:33 dwisehaupt: pushing {{Gerrit|60f9baaf50b}} to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - [[phab:T170321|T170321]]
* 20:06 mforns@deploy1002: Started deploy [analytics/refinery@d2dfced] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2dfced]
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@d2dfced] (thin): Regular analytics weekly train THIN [analytics/refinery@d2dfced] (duration: 00m 07s)
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@d2dfced] (thin): Regular analytics weekly train THIN [analytics/refinery@d2dfced]
* 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org
* 22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 19:55 andrew@cumin1001: START -
* 21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
* 21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
* 21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 robh@cumin1001: START - Cookbook sre.dns.netbox


== 2021-03-22 ==
== 2022-05-09 ==
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 21:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: new kernel round deux
* 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 21:58 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: new kernel round deux
* 23:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2250.codfw.wmnet
* 21:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: new kernel, round deux
* 23:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: new kernel, round deux
* 23:18 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: [[phab:T262612|T262612]]: Start glent m1 ab test (duration: 01m 53s)
* 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2250.codfw.wmnet
* 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2249.codfw.wmnet
* 21:19 cjming: end of UTC late backport & config window
* 22:52 mutante: decom mw2249
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2249.codfw.wmnet
* 21:18 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/GrowthExperiments: Backport: [[gerrit:790406{{!}}Newcomer tasks: deploy AND topic selection to pilot wikis (T305399)]] (duration: 00m 54s)
* 21:08 sbassett: Deployed security patch for [[phab:T272244|T272244]]
* 21:14 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/CampaignConfig.php: Backport: [[gerrit:790336{{!}}CampaignConfig: Avoid array_push() error]] (duration: 00m 51s)
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet,service=canary
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet,service=canary
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2279.codfw.wmnet,service=canary
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2278.codfw.wmnet,service=canary
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:50 mutante: gerrit2001 - restarted apache2 as well for consistency
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:47 mutante: gerrit - restarting apache2 after we dropped MaxClients config line. This should make us fall back to Debian default MaxRequestWorkers. (since we use event MPM we should not be using MaxClients in the first place, says #httpd) ([[phab:T277127|T277127]])
* 21:02 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/skins/Vector/resources: Backport: [[gerrit:790426{{!}}Adjust table of contents margins at 1000-1200 breakpoint (T307004)]] (duration: 00m 53s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|25247c9cbba3d3741908164f2d15fb8497ce8b5e}}: hrwiki: Configure mentorship for Growth team features ([[phab:T275684|T275684]]) (duration: 01m 00s)
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|951601f7a4c887f21e209b32dbd1cfd3da084816}}: Grant enwiki pagemovers the delete-redirect right ([[phab:T278131|T278131]]) (duration: 00m 59s)
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:30 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 20:36 cjming@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:790408{{!}}cirrus: Enable DeprecationLoggedHttps (T218994)]] (duration: 00m 51s)
* 16:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:32 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/Kartographer/modules/box: Backport: [[gerrit:790329{{!}}Refresh MediaWiki globals when loading mapdata (T307650)]] (duration: 00m 52s)
* 16:47 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:46 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14990 and previous config saved to /var/cache/conftool/dbconfig/20210322-155808-root.json
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14989 and previous config saved to /var/cache/conftool/dbconfig/20210322-154304-root.json
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:25 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum6002.drmrs.wmnet
* 15:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:17 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum6002.drmrs.wmnet
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14988 and previous config saved to /var/cache/conftool/dbconfig/20210322-152800-root.json
* 19:17 sukhe: depool durum6002.drmrs.wmnet (as part of [[phab:T307427|T307427]])
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14987 and previous config saved to /var/cache/conftool/dbconfig/20210322-151257-root.json
* 19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum6001.drmrs.wmnet
* 14:26 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum6001.drmrs.wmnet
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 19:04 sukhe: depool durum6001.drmrs.wmnet (as part of [[phab:T307427|T307427]])
* 14:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 18:13 mutante: rebooting mwmaint2002 (not active maint server)
* 14:22 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reboot
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 18:13 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reboot
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 18:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: reboot
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P14986 and previous config saved to /var/cache/conftool/dbconfig/20210322-141146-marostegui.json
* 18:06 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: reboot
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14985 and previous config saved to /var/cache/conftool/dbconfig/20210322-140800-root.json
* 18:05 mutante: etherpad - maintenance reboot - expect a short downtime
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad - [[phab:T277771|T277771]]
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14984 and previous config saved to /var/cache/conftool/dbconfig/20210322-135256-root.json
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14983 and previous config saved to /var/cache/conftool/dbconfig/20210322-133753-root.json
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 17:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 17:22 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:790345{{!}}Bumping portals to master (T304629)]] (duration: 00m 50s)
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14982 and previous config saved to /var/cache/conftool/dbconfig/20210322-132249-root.json
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 17:22 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:790345{{!}}Bumping portals to master (T304629)]] (duration: 00m 52s)
* 13:16 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 17:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:20 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 16:49 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 16:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P14981 and previous config saved to /var/cache/conftool/dbconfig/20210322-121924-marostegui.json
* 16:14 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1003.eqiad.wmnet
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14980 and previous config saved to /var/cache/conftool/dbconfig/20210322-112954-root.json
* 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh6002.wikimedia.org
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14979 and previous config saved to /var/cache/conftool/dbconfig/20210322-112707-root.json
* 16:10 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1003.eqiad.wmnet
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:07 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1049 to resolve CirrusSearchJVMGCOldPoolFlatlined
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:03 sukhe: depool doh6002 (as part of [[phab:T307427|T307427]])
* 11:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:02 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh6002.wikimedia.org
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14978 and previous config saved to /var/cache/conftool/dbconfig/20210322-111451-root.json
* 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14977 and previous config saved to /var/cache/conftool/dbconfig/20210322-111203-root.json
* 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh6001.wikimedia.org
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14976 and previous config saved to /var/cache/conftool/dbconfig/20210322-105947-root.json
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14975 and previous config saved to /var/cache/conftool/dbconfig/20210322-105700-root.json
* 15:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:51 moritzm: installing libdbi-perl security updates
* 15:35 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh6001.wikimedia.org
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh6001.wikimedia.org
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:35 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh6001.wikimedia.org
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:34 sukhe: depool doh6001 (as part of [[phab:T307427|T307427]])
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:05 taavi: UTC afternoon backport window done
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:04 taavi@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/ContentTranslation/app: Backport: [[gerrit:790328{{!}}CX3 Build 0.2.0+20220509 (T306643)]] (duration: 00m 51s)
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14974 and previous config saved to /var/cache/conftool/dbconfig/20210322-104443-root.json
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14973 and previous config saved to /var/cache/conftool/dbconfig/20210322-104156-root.json
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:42 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: old kernel :(
* 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: old kernel :(
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: old kernel :(
* 10:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:52 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: old kernel :(
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2001.codfw.wmnet
* 10:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:780874{{!}}Newcomer tasks: deploy AND topic selection to pilot wikis (T305399)]] (duration: 00m 49s)
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:41 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host centrallog2002.codfw.wmnet
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:41 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:12 elukey: