You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(jynus Synchronized wmf-config/db-eqiad.php: Repool es1001, depool es1002 (duration: 00m 14s) (logmsgbot))
 
imported>Stashbot
(jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet)
Line 1: Line 1:
== June 22 ==
== 2022-06-25 ==
* 16:26 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1001, depool es1002 (duration: 00m 14s)
* 18:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 16:12 andrewbogott: shutting down virt1000
* 18:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 16:08 andrewbogott: disabling puppet on virt1000
* 18:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 16:07 ottomata: deploying eventlogging 0.9.  This includes changes for arbitrary eventlogging URIs in all eventlogging stages, as well as support for schema based kafka topic URIs. 
* 18:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 15:24 logmsgbot: thcipriani Synchronized php-1.26wmf10/extensions/WikiEditor: SWAT: Reduce 'Edit' EventLogging schema sampling rate to 6.25% (1/16th) [[gerrit:219837]] (duration: 00m 13s)
* 13:16 elukey: restart rsyslog on ml-staging-ctrl200[1,2] - broken connections to centrallog
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Default wmgUseWikibaseQuality on beta to true. [[gerrit:219630]] (duration: 00m 14s)
* 10:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 08m 27s)
* 14:32 hashar: restarting Jenkins
* 10:45 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:26 jynus: rebooting es1001 for regular maintenance
* 10:44 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 01m 33s)
* 12:08 paravoid: powercycled ms-be1002, stuck at console
* 10:42 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 11:12 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1001 (duration: 00m 13s)
* 10:28 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 18s)
* 11:06 _joe_: restarting hhvm on the low-memory appservers (main and api)
* 10:28 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 09:23 hashar: upgrading Jenkins gearman plugin from 0.1.1 to latest master (f2024bd). Restarting Jenkins.
* 10:25 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 04s)
* 05:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 22 05:11:22 UTC 2015 (duration 11m 21s)
* 10:25 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-22 02:31:32+00:00
* 10:23 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 49m 58s)
* 02:27 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 27s)
* 09:54 elukey: restart rsyslog on ml-serve-ctrl200[1,2] - broken connections to centrallog
* 00:44 jgage: restarted gitblit on antimony again
* 09:33 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 09:32 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 04s)
* 09:32 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)


== June 21 ==
== 2022-06-24 ==
* 11:28 jynus: restarting apache on mw1110
* 19:35 dancy@deploy1002: backport aborted:  (duration: 00m 12s)
* 06:55 gwicke: restarted  bootstrap on restbase1009 earlier today; hardware hasn't died yet
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:01 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 21 05:01:07 UTC 2015 (duration 1m 6s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-21 02:27:13+00:00
* 18:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 10m 23s)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:39 jgage: restarted gitblit on antimony at 00:43 UTC
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:37 Krenair: testing morebots
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:31 sukhe: finished running homer * commit "adding sukhe" CR: {{Gerrit|8071451}}
* 15:18 dancy@deploy1002: Finished deploy [integration/docroot@ea9b8fa]: (no justification provided) (duration: 00m 08s)
* 15:17 dancy@deploy1002: Started deploy [integration/docroot@ea9b8fa]: (no justification provided)
* 15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 04s)
* 14:53 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 02m 37s)
* 14:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:39 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:39 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30242 and previous config saved to /var/cache/conftool/dbconfig/20220624-143544-root.json
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30241 and previous config saved to /var/cache/conftool/dbconfig/20220624-143537-root.json
* 14:31 sukhe: running homer * commit "adding sukhe" CR: 807145
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30240 and previous config saved to /var/cache/conftool/dbconfig/20220624-142303-root.json
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30239 and previous config saved to /var/cache/conftool/dbconfig/20220624-142040-root.json
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30238 and previous config saved to /var/cache/conftool/dbconfig/20220624-142033-root.json
* 14:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:11 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:10 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30237 and previous config saved to /var/cache/conftool/dbconfig/20220624-140759-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30236 and previous config saved to /var/cache/conftool/dbconfig/20220624-140536-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30235 and previous config saved to /var/cache/conftool/dbconfig/20220624-140529-root.json
* 14:03 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:03 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30234 and previous config saved to /var/cache/conftool/dbconfig/20220624-135940-root.json
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30233 and previous config saved to /var/cache/conftool/dbconfig/20220624-135255-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30232 and previous config saved to /var/cache/conftool/dbconfig/20220624-135032-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30231 and previous config saved to /var/cache/conftool/dbconfig/20220624-135025-root.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30230 and previous config saved to /var/cache/conftool/dbconfig/20220624-134436-root.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30229 and previous config saved to /var/cache/conftool/dbconfig/20220624-134423-root.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30228 and previous config saved to /var/cache/conftool/dbconfig/20220624-133751-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30227 and previous config saved to /var/cache/conftool/dbconfig/20220624-133528-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30226 and previous config saved to /var/cache/conftool/dbconfig/20220624-133521-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30225 and previous config saved to /var/cache/conftool/dbconfig/20220624-132932-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30224 and previous config saved to /var/cache/conftool/dbconfig/20220624-132919-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30223 and previous config saved to /var/cache/conftool/dbconfig/20220624-132247-root.json
* 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS buster
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30222 and previous config saved to /var/cache/conftool/dbconfig/20220624-132024-root.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30221 and previous config saved to /var/cache/conftool/dbconfig/20220624-132017-root.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30220 and previous config saved to /var/cache/conftool/dbconfig/20220624-131428-root.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30219 and previous config saved to /var/cache/conftool/dbconfig/20220624-131415-root.json
* 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:11 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30218 and previous config saved to /var/cache/conftool/dbconfig/20220624-130937-root.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30217 and previous config saved to /var/cache/conftool/dbconfig/20220624-130743-root.json
* 13:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
* 13:05 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:05 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30216 and previous config saved to /var/cache/conftool/dbconfig/20220624-130519-root.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30215 and previous config saved to /var/cache/conftool/dbconfig/20220624-130514-root.json
* 13:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30214 and previous config saved to /var/cache/conftool/dbconfig/20220624-130055-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30213 and previous config saved to /var/cache/conftool/dbconfig/20220624-125924-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30212 and previous config saved to /var/cache/conftool/dbconfig/20220624-125911-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30211 and previous config saved to /var/cache/conftool/dbconfig/20220624-125834-root.json
* 12:58 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 12:58 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30210 and previous config saved to /var/cache/conftool/dbconfig/20220624-125433-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30209 and previous config saved to /var/cache/conftool/dbconfig/20220624-125401-root.json
* 12:54 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 12:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:53 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:46 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:46 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30208 and previous config saved to /var/cache/conftool/dbconfig/20220624-124420-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30207 and previous config saved to /var/cache/conftool/dbconfig/20220624-124407-root.json
* 12:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 03s)
* 12:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30206 and previous config saved to /var/cache/conftool/dbconfig/20220624-123929-root.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30205 and previous config saved to /var/cache/conftool/dbconfig/20220624-123857-root.json
* 12:34 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 03s)
* 12:34 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30204 and previous config saved to /var/cache/conftool/dbconfig/20220624-122916-root.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30203 and previous config saved to /var/cache/conftool/dbconfig/20220624-122903-root.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30202 and previous config saved to /var/cache/conftool/dbconfig/20220624-122728-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30201 and previous config saved to /var/cache/conftool/dbconfig/20220624-122425-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30200 and previous config saved to /var/cache/conftool/dbconfig/20220624-122353-root.json
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30199 and previous config saved to /var/cache/conftool/dbconfig/20220624-122256-root.json
* 12:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 28s)
* 12:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30198 and previous config saved to /var/cache/conftool/dbconfig/20220624-121359-root.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30197 and previous config saved to /var/cache/conftool/dbconfig/20220624-121224-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30196 and previous config saved to /var/cache/conftool/dbconfig/20220624-120922-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30195 and previous config saved to /var/cache/conftool/dbconfig/20220624-120849-root.json
* 12:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@18182aa]: (no justification provided) (duration: 03m 47s)
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30194 and previous config saved to /var/cache/conftool/dbconfig/20220624-120632-root.json
* 12:04 bmansurov@deploy1002: Started deploy [airflow-dags/research@18182aa]: (no justification provided)
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30193 and previous config saved to /var/cache/conftool/dbconfig/20220624-120411-root.json
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30192 and previous config saved to /var/cache/conftool/dbconfig/20220624-115720-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30191 and previous config saved to /var/cache/conftool/dbconfig/20220624-115418-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30190 and previous config saved to /var/cache/conftool/dbconfig/20220624-115345-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30189 and previous config saved to /var/cache/conftool/dbconfig/20220624-114907-root.json
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30188 and previous config saved to /var/cache/conftool/dbconfig/20220624-114816-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30187 and previous config saved to /var/cache/conftool/dbconfig/20220624-114217-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30186 and previous config saved to /var/cache/conftool/dbconfig/20220624-113914-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30185 and previous config saved to /var/cache/conftool/dbconfig/20220624-113841-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30184 and previous config saved to /var/cache/conftool/dbconfig/20220624-113403-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30183 and previous config saved to /var/cache/conftool/dbconfig/20220624-113312-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30182 and previous config saved to /var/cache/conftool/dbconfig/20220624-113020-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30181 and previous config saved to /var/cache/conftool/dbconfig/20220624-112713-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30180 and previous config saved to /var/cache/conftool/dbconfig/20220624-111859-root.json
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30179 and previous config saved to /var/cache/conftool/dbconfig/20220624-111808-root.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30178 and previous config saved to /var/cache/conftool/dbconfig/20220624-111209-root.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30177 and previous config saved to /var/cache/conftool/dbconfig/20220624-110356-root.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30176 and previous config saved to /var/cache/conftool/dbconfig/20220624-110305-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30175 and previous config saved to /var/cache/conftool/dbconfig/20220624-105705-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30174 and previous config saved to /var/cache/conftool/dbconfig/20220624-104852-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30173 and previous config saved to /var/cache/conftool/dbconfig/20220624-104849-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30172 and previous config saved to /var/cache/conftool/dbconfig/20220624-104801-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30171 and previous config saved to /var/cache/conftool/dbconfig/20220624-104407-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30170 and previous config saved to /var/cache/conftool/dbconfig/20220624-104403-root.json
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30169 and previous config saved to /var/cache/conftool/dbconfig/20220624-103342-root.json
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30168 and previous config saved to /var/cache/conftool/dbconfig/20220624-103257-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30166 and previous config saved to /var/cache/conftool/dbconfig/20220624-102904-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30165 and previous config saved to /var/cache/conftool/dbconfig/20220624-102859-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30164 and previous config saved to /var/cache/conftool/dbconfig/20220624-102856-root.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30163 and previous config saved to /var/cache/conftool/dbconfig/20220624-101753-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30162 and previous config saved to /var/cache/conftool/dbconfig/20220624-101400-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30161 and previous config saved to /var/cache/conftool/dbconfig/20220624-101349-root.json
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30160 and previous config saved to /var/cache/conftool/dbconfig/20220624-100752-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30159 and previous config saved to /var/cache/conftool/dbconfig/20220624-095946-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30158 and previous config saved to /var/cache/conftool/dbconfig/20220624-095935-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30157 and previous config saved to /var/cache/conftool/dbconfig/20220624-095856-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30156 and previous config saved to /var/cache/conftool/dbconfig/20220624-095845-root.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30155 and previous config saved to /var/cache/conftool/dbconfig/20220624-094442-root.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30154 and previous config saved to /var/cache/conftool/dbconfig/20220624-094431-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30153 and previous config saved to /var/cache/conftool/dbconfig/20220624-094352-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30152 and previous config saved to /var/cache/conftool/dbconfig/20220624-094342-root.json
* 09:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:35 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 09:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30151 and previous config saved to /var/cache/conftool/dbconfig/20220624-092938-root.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30150 and previous config saved to /var/cache/conftool/dbconfig/20220624-092927-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30149 and previous config saved to /var/cache/conftool/dbconfig/20220624-092848-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30148 and previous config saved to /var/cache/conftool/dbconfig/20220624-092838-root.json
* 09:25 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:24 moritzm: installing publicsuffix updates from last buster point release
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30147 and previous config saved to /var/cache/conftool/dbconfig/20220624-091434-root.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30146 and previous config saved to /var/cache/conftool/dbconfig/20220624-091423-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30145 and previous config saved to /var/cache/conftool/dbconfig/20220624-091344-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30144 and previous config saved to /var/cache/conftool/dbconfig/20220624-091334-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30143 and previous config saved to /var/cache/conftool/dbconfig/20220624-091227-root.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137,db1138 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30142 and previous config saved to /var/cache/conftool/dbconfig/20220624-090810-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30141 and previous config saved to /var/cache/conftool/dbconfig/20220624-085930-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30140 and previous config saved to /var/cache/conftool/dbconfig/20220624-085919-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30139 and previous config saved to /var/cache/conftool/dbconfig/20220624-085904-root.json
* 08:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts webperf2002.codfw.wmnet
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30137 and previous config saved to /var/cache/conftool/dbconfig/20220624-085723-root.json
* 08:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts webperf2002.codfw.wmnet
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30136 and previous config saved to /var/cache/conftool/dbconfig/20220624-085217-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30135 and previous config saved to /var/cache/conftool/dbconfig/20220624-085210-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30134 and previous config saved to /var/cache/conftool/dbconfig/20220624-085129-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30133 and previous config saved to /var/cache/conftool/dbconfig/20220624-085003-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30132 and previous config saved to /var/cache/conftool/dbconfig/20220624-084426-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30131 and previous config saved to /var/cache/conftool/dbconfig/20220624-084415-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30130 and previous config saved to /var/cache/conftool/dbconfig/20220624-084401-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30129 and previous config saved to /var/cache/conftool/dbconfig/20220624-084219-root.json
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30128 and previous config saved to /var/cache/conftool/dbconfig/20220624-083806-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30127 and previous config saved to /var/cache/conftool/dbconfig/20220624-083713-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30126 and previous config saved to /var/cache/conftool/dbconfig/20220624-083706-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30125 and previous config saved to /var/cache/conftool/dbconfig/20220624-083625-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30124 and previous config saved to /var/cache/conftool/dbconfig/20220624-083459-root.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30123 and previous config saved to /var/cache/conftool/dbconfig/20220624-082857-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30115 and previous config saved to /var/cache/conftool/dbconfig/20220624-080705-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30114 and previous config saved to /var/cache/conftool/dbconfig/20220624-080658-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30113 and previous config saved to /var/cache/conftool/dbconfig/20220624-080618-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30112 and previous config saved to /var/cache/conftool/dbconfig/20220624-080451-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30111 and previous config saved to /var/cache/conftool/dbconfig/20220624-075849-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30110 and previous config saved to /var/cache/conftool/dbconfig/20220624-075707-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30109 and previous config saved to /var/cache/conftool/dbconfig/20220624-075201-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30108 and previous config saved to /var/cache/conftool/dbconfig/20220624-075154-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30107 and previous config saved to /var/cache/conftool/dbconfig/20220624-075114-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30106 and previous config saved to /var/cache/conftool/dbconfig/20220624-075102-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30105 and previous config saved to /var/cache/conftool/dbconfig/20220624-074947-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30104 and previous config saved to /var/cache/conftool/dbconfig/20220624-074345-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30103 and previous config saved to /var/cache/conftool/dbconfig/20220624-074204-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30102 and previous config saved to /var/cache/conftool/dbconfig/20220624-073657-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30101 and previous config saved to /var/cache/conftool/dbconfig/20220624-073651-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30100 and previous config saved to /var/cache/conftool/dbconfig/20220624-073610-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30099 and previous config saved to /var/cache/conftool/dbconfig/20220624-073558-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30098 and previous config saved to /var/cache/conftool/dbconfig/20220624-073543-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30097 and previous config saved to /var/cache/conftool/dbconfig/20220624-073444-root.json
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-cache[2001-2003].codfw.wmnet with reason: reboots
* 07:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-cache[2001-2003].codfw.wmnet with reason: reboots
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30096 and previous config saved to /var/cache/conftool/dbconfig/20220624-072841-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30095 and previous config saved to /var/cache/conftool/dbconfig/20220624-072240-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30094 and previous config saved to /var/cache/conftool/dbconfig/20220624-072153-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30093 and previous config saved to /var/cache/conftool/dbconfig/20220624-072147-root.json
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30092 and previous config saved to /var/cache/conftool/dbconfig/20220624-072106-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30091 and previous config saved to /var/cache/conftool/dbconfig/20220624-072054-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30090 and previous config saved to /var/cache/conftool/dbconfig/20220624-071940-root.json
* 07:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30089 and previous config saved to /var/cache/conftool/dbconfig/20220624-071551-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30088 and previous config saved to /var/cache/conftool/dbconfig/20220624-071439-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es1025 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30087 and previous config saved to /var/cache/conftool/dbconfig/20220624-070700-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30086 and previous config saved to /var/cache/conftool/dbconfig/20220624-070601-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30085 and previous config saved to /var/cache/conftool/dbconfig/20220624-070555-root.json
* 07:02 marostegui: Reboot db1117 for kernel upgrade (expect haproxy irc alerts)
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30084 and previous config saved to /var/cache/conftool/dbconfig/20220624-070201-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30083 and previous config saved to /var/cache/conftool/dbconfig/20220624-070157-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30082 and previous config saved to /var/cache/conftool/dbconfig/20220624-070151-root.json
* 06:53 jynus: restarting bacula director @ backup1001
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30081 and previous config saved to /var/cache/conftool/dbconfig/20220624-065057-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30080 and previous config saved to /var/cache/conftool/dbconfig/20220624-065051-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30079 and previous config saved to /var/cache/conftool/dbconfig/20220624-064657-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30078 and previous config saved to /var/cache/conftool/dbconfig/20220624-064653-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30077 and previous config saved to /var/cache/conftool/dbconfig/20220624-064647-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30076 and previous config saved to /var/cache/conftool/dbconfig/20220624-063553-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30075 and previous config saved to /var/cache/conftool/dbconfig/20220624-063547-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30074 and previous config saved to /var/cache/conftool/dbconfig/20220624-063154-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30073 and previous config saved to /var/cache/conftool/dbconfig/20220624-063149-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30072 and previous config saved to /var/cache/conftool/dbconfig/20220624-063143-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30071 and previous config saved to /var/cache/conftool/dbconfig/20220624-062049-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30070 and previous config saved to /var/cache/conftool/dbconfig/20220624-062043-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30069 and previous config saved to /var/cache/conftool/dbconfig/20220624-061650-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30068 and previous config saved to /var/cache/conftool/dbconfig/20220624-061645-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30067 and previous config saved to /var/cache/conftool/dbconfig/20220624-061640-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30066 and previous config saved to /var/cache/conftool/dbconfig/20220624-060545-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30065 and previous config saved to /var/cache/conftool/dbconfig/20220624-060539-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30064 and previous config saved to /var/cache/conftool/dbconfig/20220624-060146-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30063 and previous config saved to /var/cache/conftool/dbconfig/20220624-060141-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30062 and previous config saved to /var/cache/conftool/dbconfig/20220624-060136-root.json
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30061 and previous config saved to /var/cache/conftool/dbconfig/20220624-055643-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30060 and previous config saved to /var/cache/conftool/dbconfig/20220624-055042-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30059 and previous config saved to /var/cache/conftool/dbconfig/20220624-055035-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30058 and previous config saved to /var/cache/conftool/dbconfig/20220624-054642-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30057 and previous config saved to /var/cache/conftool/dbconfig/20220624-054637-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30056 and previous config saved to /var/cache/conftool/dbconfig/20220624-054632-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170 after kernel reboots', diff saved to https://phabricator.wikimedia.org/P30055 and previous config saved to /var/cache/conftool/dbconfig/20220624-054259-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30054 and previous config saved to /var/cache/conftool/dbconfig/20220624-054139-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30053 and previous config saved to /var/cache/conftool/dbconfig/20220624-053652-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30052 and previous config saved to /var/cache/conftool/dbconfig/20220624-053538-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30051 and previous config saved to /var/cache/conftool/dbconfig/20220624-053531-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30050 and previous config saved to /var/cache/conftool/dbconfig/20220624-053138-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30049 and previous config saved to /var/cache/conftool/dbconfig/20220624-053134-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30048 and previous config saved to /var/cache/conftool/dbconfig/20220624-053128-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 db1169 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30047 and previous config saved to /var/cache/conftool/dbconfig/20220624-052758-root.json
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 db1174 db1175 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30046 and previous config saved to /var/cache/conftool/dbconfig/20220624-052137-root.json


== June 20 ==
== 2022-06-23 ==
* 22:50 bblack: restarted gitblit java service on antimony
* 21:23 mutante: restbase-dev1006 has manually installed packages (wrk, maybe others)
* 04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 20 04:27:14 UTC 2015 (duration 27m 13s)
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-20 02:21:30+00:00
* 21:22 brennen: end of utc late backport & config window
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 02s)
* 21:21 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808055{{!}}[cleanup] Drop non-existent feature flags]] (duration: 03m 33s)
* 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:13 thcipriani@deploy1002: Finished scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]] (duration: 17m 29s)
* 21:01 inflatador: looking in to wdqs1006 alert ^^
* 20:56 thcipriani@deploy1002: Started scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]]
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:49 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808064{{!}}Enable DiscussionTools topicsubscription, autotopicsub on testwiki (T310808)]] (duration: 03m 18s)
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dse-k8s-ctrl1001.eqiad.wmnet
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:48 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806847{{!}}ukwikibooks: Add NS102 (Рецепт) to wgContentNamespaces (T310940)]] (duration: 03m 41s)
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 mutante: cumin -b 15 -p 95 'mw1*' 'run-puppet-agent -q --failed-only'
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 mutante: cumin -b 15 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 20:09 mutante: cumin -b 15 -p 95 'parse*' 'run-puppet-agent -q --failed-only'
* 20:07 mutante: cumin -b 15 -p 95 'wtp*' 'run-puppet-agent -q --failed-only'
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:39 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:34 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:24 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:21 ejegg: fundraising python tools updated from {{Gerrit|40d376d4}} to {{Gerrit|acf89fb2}}
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:38 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 18:01 brennen: train 1.39.0-wmf.17 ([[phab:T308070|T308070]]): no current blockers - rolling to all wikis
* 18:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:53 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:32 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:00 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:17 hashar: Upgrading CI Jenkins # [[phab:T311174|T311174]]
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:807902{{!}}Do not re-use "wikibase_config" for registering the language selector... (T307869)]] (duration: 03m 22s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30042 and previous config saved to /var/cache/conftool/dbconfig/20220623-150954-root.json
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30041 and previous config saved to /var/cache/conftool/dbconfig/20220623-150951-root.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30040 and previous config saved to /var/cache/conftool/dbconfig/20220623-150422-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30039 and previous config saved to /var/cache/conftool/dbconfig/20220623-145450-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30038 and previous config saved to /var/cache/conftool/dbconfig/20220623-145448-root.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30037 and previous config saved to /var/cache/conftool/dbconfig/20220623-144918-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30036 and previous config saved to /var/cache/conftool/dbconfig/20220623-143946-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30035 and previous config saved to /var/cache/conftool/dbconfig/20220623-143944-root.json
* 14:34 papaul: on going PDU maintenance in rack A3 codfw
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30034 and previous config saved to /var/cache/conftool/dbconfig/20220623-143414-root.json
* 14:31 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:30 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30033 and previous config saved to /var/cache/conftool/dbconfig/20220623-142443-root.json
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30032 and previous config saved to /var/cache/conftool/dbconfig/20220623-142440-root.json
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30031 and previous config saved to /var/cache/conftool/dbconfig/20220623-141910-root.json
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 taavi@deploy1002: Synchronized php-1.39.0-wmf.17/includes/skins/Skin.php: Backport: [[gerrit:807900{{!}}Skin: Change viewport based on feedback (T311119)]] (duration: 03m 29s)
* 14:10 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30030 and previous config saved to /var/cache/conftool/dbconfig/20220623-140939-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30029 and previous config saved to /var/cache/conftool/dbconfig/20220623-140936-root.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30028 and previous config saved to /var/cache/conftool/dbconfig/20220623-140406-root.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:00 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:00 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 13:58 moritzm: import jenkins 2.346.1 to thirdparty/ci [[phab:T311174|T311174]]
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30027 and previous config saved to /var/cache/conftool/dbconfig/20220623-135435-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30026 and previous config saved to /var/cache/conftool/dbconfig/20220623-135432-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30025 and previous config saved to /var/cache/conftool/dbconfig/20220623-134902-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30024 and previous config saved to /var/cache/conftool/dbconfig/20220623-133931-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30023 and previous config saved to /var/cache/conftool/dbconfig/20220623-133928-root.json
* 13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (2/2) (duration: 03m 26s)
* 13:34 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (1/2) (duration: 03m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30022 and previous config saved to /var/cache/conftool/dbconfig/20220623-133358-root.json
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182 db1184 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30021 and previous config saved to /var/cache/conftool/dbconfig/20220623-132951-root.json
* 13:27 sukhe: disable puppet on A:durum or A:wikidough or A:centrallog or A:dns-rec: deploying [[phab:T310574|T310574]]
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30020 and previous config saved to /var/cache/conftool/dbconfig/20220623-132729-root.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30019 and previous config saved to /var/cache/conftool/dbconfig/20220623-132133-root.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30018 and previous config saved to /var/cache/conftool/dbconfig/20220623-132128-root.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807050{{!}}[ImageSuggestions] Enable extension on ptwiki, ruwiki & idwiki (T302711)]] (duration: 03m 44s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30017 and previous config saved to /var/cache/conftool/dbconfig/20220623-130629-root.json
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30016 and previous config saved to /var/cache/conftool/dbconfig/20220623-130624-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30015 and previous config saved to /var/cache/conftool/dbconfig/20220623-125553-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30014 and previous config saved to /var/cache/conftool/dbconfig/20220623-125547-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30013 and previous config saved to /var/cache/conftool/dbconfig/20220623-125125-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30012 and previous config saved to /var/cache/conftool/dbconfig/20220623-125120-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30011 and previous config saved to /var/cache/conftool/dbconfig/20220623-124049-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30010 and previous config saved to /var/cache/conftool/dbconfig/20220623-124043-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30009 and previous config saved to /var/cache/conftool/dbconfig/20220623-123621-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30008 and previous config saved to /var/cache/conftool/dbconfig/20220623-123616-root.json
* 12:26 moritzm: installing waitress security updates
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30007 and previous config saved to /var/cache/conftool/dbconfig/20220623-122545-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30006 and previous config saved to /var/cache/conftool/dbconfig/20220623-122539-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30005 and previous config saved to /var/cache/conftool/dbconfig/20220623-122118-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30004 and previous config saved to /var/cache/conftool/dbconfig/20220623-122112-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30003 and previous config saved to /var/cache/conftool/dbconfig/20220623-121041-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30002 and previous config saved to /var/cache/conftool/dbconfig/20220623-121035-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30001 and previous config saved to /var/cache/conftool/dbconfig/20220623-120614-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30000 and previous config saved to /var/cache/conftool/dbconfig/20220623-120608-root.json
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29999 and previous config saved to /var/cache/conftool/dbconfig/20220623-115537-root.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29998 and previous config saved to /var/cache/conftool/dbconfig/20220623-115532-root.json
* 11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29997 and previous config saved to /var/cache/conftool/dbconfig/20220623-115110-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29996 and previous config saved to /var/cache/conftool/dbconfig/20220623-115104-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128 db1129 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29995 and previous config saved to /var/cache/conftool/dbconfig/20220623-114159-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29994 and previous config saved to /var/cache/conftool/dbconfig/20220623-114033-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29993 and previous config saved to /var/cache/conftool/dbconfig/20220623-114028-root.json
* 11:32 kart_: Updated cxserver to 2022-06-23-052732-production ([[phab:T311196|T311196]])
* 11:31 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:31 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:30 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29992 and previous config saved to /var/cache/conftool/dbconfig/20220623-112529-root.json
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29991 and previous config saved to /var/cache/conftool/dbconfig/20220623-112524-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 es1024 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29990 and previous config saved to /var/cache/conftool/dbconfig/20220623-110804-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29989 and previous config saved to /var/cache/conftool/dbconfig/20220623-105333-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29988 and previous config saved to /var/cache/conftool/dbconfig/20220623-105326-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29987 and previous config saved to /var/cache/conftool/dbconfig/20220623-105320-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29986 and previous config saved to /var/cache/conftool/dbconfig/20220623-103829-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29985 and previous config saved to /var/cache/conftool/dbconfig/20220623-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29984 and previous config saved to /var/cache/conftool/dbconfig/20220623-103816-root.json
* 10:25 jayme: running restart-php7.2-fpm A:parsoid or A:mw or A:mw-api to disable opcache revalidation - [[phab:T266055|T266055]]
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29983 and previous config saved to /var/cache/conftool/dbconfig/20220623-102325-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29982 and previous config saved to /var/cache/conftool/dbconfig/20220623-102318-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29981 and previous config saved to /var/cache/conftool/dbconfig/20220623-102312-root.json
* 10:21 XioNoX: fix eqiad lvs switch port MTU
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29980 and previous config saved to /var/cache/conftool/dbconfig/20220623-100822-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29979 and previous config saved to /var/cache/conftool/dbconfig/20220623-100815-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29978 and previous config saved to /var/cache/conftool/dbconfig/20220623-100808-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29977 and previous config saved to /var/cache/conftool/dbconfig/20220623-095318-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29976 and previous config saved to /var/cache/conftool/dbconfig/20220623-095311-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29975 and previous config saved to /var/cache/conftool/dbconfig/20220623-095304-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29973 and previous config saved to /var/cache/conftool/dbconfig/20220623-093814-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29972 and previous config saved to /var/cache/conftool/dbconfig/20220623-093807-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29971 and previous config saved to /var/cache/conftool/dbconfig/20220623-093800-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29970 and previous config saved to /var/cache/conftool/dbconfig/20220623-092310-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29969 and previous config saved to /var/cache/conftool/dbconfig/20220623-092303-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29968 and previous config saved to /var/cache/conftool/dbconfig/20220623-092256-root.json
* 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 db1179 db1180 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29967 and previous config saved to /var/cache/conftool/dbconfig/20220623-090842-root.json
* 09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:52 joal@deploy1002: Finished deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs (duration: 00m 08s)
* 08:52 joal@deploy1002: Started deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs
* 08:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 7 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 7 hosts with reason: Reboots
* 07:39 moritzm: installing firejail security updates
* 07:36 TheresNoTime: UTC morning deploys done
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806365{{!}}GrowthExperiments: Enable link recommendations frontend, round 4 (T304548)]] (duration: 03m 37s)
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Reboots
* 00:35 brennen: end of phabricator maintenance window
* 00:13 brennen: phabricator deploy finished ([[phab:T311175|T311175]])
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance


== June 19 ==
== 2022-06-22 ==
* 23:32 gwicke: upgraded restbase1006 to cassandra 2.1.7
* 22:56 tzatziki: removing 1 file for legal compliance
* 23:30 gwicke: starting cassandra bootstrap on restbase1009
* 21:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
* 21:37 gwicke: upgraded cassandra on 1003 to 2.1.7 (pre-release, likely going out on Monday)
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 to resolve Old GC Hell alert
* 18:32 godog: stop cassandra on restbase1008
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad to resolve Old GC Hell alert
* 17:45 logmsgbot: krenair Synchronized private/PrivateSettings.php: sync 4a30446e for wikitech cleanup - T102361 (duration: 00m 12s)
* 21:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1006.eqiad.wmnet with OS bullseye
* 17:24 godog: install linux 3.19 on restbase100[789]
* 20:49 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44] (duration: 01m 18s)
* 17:12 ori: salt -t30 -G 'php:hhvm' cmd.run 'rm -f /usr/local/bin/check_tc_space' (https://gerrit.wikimedia.org/r/#/c/219102/)
* 20:48 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44]
* 16:54 moritzm: updated/rebooted nescio/maerlant to 3.19
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 13:40 andrewbogott: test test test
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 02:19 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-19 02:19:33+00:00
* 20:27 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 08s)
* 20:24 cjming: end of UTC late backport window
* 00:49 springle: killed storm of research queries on dbstore1002, load avg 90+, replag, likely explosion, etc. emailing analytics@
* 20:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 00:13 logmsgbot: ebernhardson Synchronized php-1.26wmf10/extensions/Flow/tests/: no-op sync of flow test cases in wmf10 (duration: 00m 17s)
* 20:19 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44] (duration: 07m 36s)
* 00:11 logmsgbot: ebernhardson Synchronized php-1.26wmf10/skins/Vector/: Bump Vector submodule in 1.26wmf10 for swat (duration: 00m 12s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807593{{!}}gawiki: Change category collation from `uppercase` to `uca-ga-u-kn` (T311136)]] (duration: 03m 39s)
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS bullseye
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44]
* 20:11 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44] (duration: 00m 07s)
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44]
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44] (duration: 06m 16s)
* 20:03 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44]
* 20:03 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44] (duration: 30m 58s)
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 19:42 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection (duration: 02m 03s)
* 19:37 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection
* 19:32 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44]
* 19:31 aqu: Deploying analytics/refinery (weekly train)
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 19:14 herron: bounced apache on lists1001
* 19:06 hashar: Restarting CI Jenkins
* 16:46 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1009.eqiad.wmnet with OS bullseye
* 16:45 hashar: Restarting CI Jenkins
* 16:43 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
* 16:33 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:29 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:18 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:14 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 16:13 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 16:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 16:08 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 16:06 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 16:05 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 16:04 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 15:36 moritzm: upload jenkins 2.332.4 to apt.wikimedia.org [[phab:T311068|T311068]]
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 15:00 jayme: published docker-registry.discovery.wmnet/helm-state-metrics:0.1.0-1 - [[phab:T310714|T310714]]
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
* 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 Lucas_WMDE: UTC afternoon backport+config window done
* 14:09 lucaswerkmeister-wmde@deploy1002: Synchronized logos/manage.py: Config: [[gerrit:807486{{!}}logos: Update phpcs comment]] (should be a no-op but syncing just in case) (duration: 03m 19s)
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 14:01 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' specieswiki<nowiki>{</nowiki>,-<nowiki>{</nowiki>1.5,2<nowiki>}</nowiki>x<nowiki>}</nowiki>.png {{!}} mwscript purgeList.php # [[phab:T310961|T310961]]
* 14:01 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (3/3) (duration: 03m 30s)
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (2/3) (duration: 03m 29s)
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:56 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 13:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (1/3) (duration: 03m 46s)
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (2/2) (duration: 03m 39s)
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (1/2) (duration: 03m 35s)
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:28 XioNoX: fix MTU on eqiad server facing switch ports
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:807255{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (2/3) (T304328)]] (duration: 03m 35s)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807254{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (1/3) (T304328)]] (duration: 03m 35s)
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 13:10 XioNoX: fix MTU in drmrs
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:807211{{!}}[wmf-config]: Deploy GDI Survey Wave 2 - BETA (T311079)]] (duration: 03m 29s)
* 12:58 XioNoX: fix MTU on codfw switches access ports
* 12:57 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 12:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 12:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 12:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 12:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 12:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 12:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 12:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 12:18 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 12:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 12:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 12:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 12:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 11:46 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 11:41 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 11:11 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 20s)
* 11:10 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:09 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 11s)
* 11:08 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:07 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 02m 54s)
* 11:05 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 10:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
* 10:53 jayme: systemctl restart rsyslog on kubernetes2008
* 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
* 10:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
* 10:41 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
* 10:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
* 10:36 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
* 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
* 10:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
* 10:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
* 10:17 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
* 10:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
* 10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 10:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
* 10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2003.codfw.wmnet
* 10:04 moritzm: installing vim security updates
* 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 09:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
* 09:35 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:35 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 09:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
* 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 08:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
* 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29964 and previous config saved to /var/cache/conftool/dbconfig/20220622-084234-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29963 and previous config saved to /var/cache/conftool/dbconfig/20220622-084225-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29962 and previous config saved to /var/cache/conftool/dbconfig/20220622-084206-root.json
* 08:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29961 and previous config saved to /var/cache/conftool/dbconfig/20220622-082730-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29960 and previous config saved to /var/cache/conftool/dbconfig/20220622-082721-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29959 and previous config saved to /var/cache/conftool/dbconfig/20220622-082702-root.json
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 08:18 marostegui: Upgrade kernel and reboot on db[1111,1132,1143,1127].eqiad.wmnet
* 08:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 08:15 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]] (duration: 03m 43s)
* 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29957 and previous config saved to /var/cache/conftool/dbconfig/20220622-081227-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29956 and previous config saved to /var/cache/conftool/dbconfig/20220622-081217-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29955 and previous config saved to /var/cache/conftool/dbconfig/20220622-081159-root.json
* 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 08:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
* 08:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
* 08:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
* 08:04 hashar: Updating operations-puppet-tests-buster-docker Jenkins job to use the latest Docker image (rebuild to catch up with latest defined gems). https://gerrit.wikimedia.org/r/c/integration/config/+/807478
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29954 and previous config saved to /var/cache/conftool/dbconfig/20220622-075721-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29953 and previous config saved to /var/cache/conftool/dbconfig/20220622-075713-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29952 and previous config saved to /var/cache/conftool/dbconfig/20220622-075655-root.json
* 07:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 07:53 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
* 07:50 marostegui: Upgrade kernel and reboot on db[2145-2150].codfw.wmnet
* 07:49 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29951 and previous config saved to /var/cache/conftool/dbconfig/20220622-074217-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29950 and previous config saved to /var/cache/conftool/dbconfig/20220622-074209-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29949 and previous config saved to /var/cache/conftool/dbconfig/20220622-074151-root.json
* 07:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 07:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29948 and previous config saved to /var/cache/conftool/dbconfig/20220622-072714-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29947 and previous config saved to /var/cache/conftool/dbconfig/20220622-072705-root.json
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29946 and previous config saved to /var/cache/conftool/dbconfig/20220622-072647-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29945 and previous config saved to /var/cache/conftool/dbconfig/20220622-071210-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29944 and previous config saved to /var/cache/conftool/dbconfig/20220622-071201-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29943 and previous config saved to /var/cache/conftool/dbconfig/20220622-071143-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 es1026 es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29942 and previous config saved to /var/cache/conftool/dbconfig/20220622-065507-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Switchover es1, es2 and es3 masters', diff saved to https://phabricator.wikimedia.org/P29941 and previous config saved to /var/cache/conftool/dbconfig/20220622-065208-marostegui.json
* 05:52 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 01:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:17 tstarling@deploy1002: Synchronized wmf-config/mc-labs.php: for completeness (duration: 03m 41s)
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:13 tstarling@deploy1002: Synchronized wmf-config/mc.php: g 807158 [[phab:T278392|T278392]] (duration: 03m 35s)
* 01:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== June 18 ==
== 2022-06-21 ==
* 23:37 logmsgbot: ebernhardson Synchronized php-1.26wmf9/skins/Vector: Bump Vector in 1.26wmf9 for SWAT (duration: 00m 16s)
* 20:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b42e57d75ec6b0536493fa073805a0bcb066aef1}}: zhwikiquote: Disable local upload ([[phab:T311017|T311017]]) (duration: 03m 43s)
* 23:22 logmsgbot: ebernhardson Synchronized wmf-config/: Actually enable the feedback link on Special:Search (duration: 00m 17s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:08 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Enable wgCirrusSearchFeedbackLink on enwiki (duration: 00m 13s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:07 godog: start (bootstrap) cassandra on restbase1008
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-urd-hin_0.1.0+svn~r60389-1
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 akosiaris: restarted salt on sca1001, truncate log files. keep a sample in /tmp/
* 20:22 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (2/2) (duration: 03m 28s)
* 20:03 chasemp: apache && hhvm restart for mw 1243 1250 1254 1256 1257
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:00 chasemp: apache && hhvm restart for mw...1256 1255 1254 1250 1243 1242 1071 1021
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:58 mutante: restarting hhvm on mw1021, mw1071
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:27 godog: bounce cassandra on restbase1003, new logging configuration
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:26 akosiaris: puppet-merged on strontium
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (1/2) (duration: 03m 30s)
* 19:15 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf10
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:06 godog: upgrade cassandra to 2.1.6 on restbase1003
* 20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f70e302e11756d9704acc86c45b3d7aabf31c4d}}: fawiktionary: Enable SandboxLink extension ([[phab:T308505|T308505]]) (duration: 03m 37s)
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-urd_0.1.0~r57551-1
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hin_0.1.0~r57344-1
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-cy-en_0.1.1~r57554-1
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:43 legoktm: fixed content model of MediaWiki:Common.css@lrcwiki
* 19:38 dancy@deploy1002: backport aborted: (duration: 00m 10s)
* 18:18 YuviPanda: restarted nutcracker on wikitech
* 19:38 dancy@deploy1002: Installation of scap version "4.9.5" completed for 558 hosts
* 18:16 YuviPanda: restarted keystone on labcontrol1001
* 19:38 dancy@deploy1002: Installing scap version "4.9.5" for 558 hosts
* 17:13 gwicke: bouncing cassandra on restbase1002
* 19:22 urandom: replicating Cassandra `system_auth` keyspace to codfw -- [[phab:T307641|T307641]]
* 17:11 godog: restart cassandra on restbase1004
* 18:56 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2` failed due to syntax error, patch here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/807200
* 15:53 gwicke: updated restbase to 7ffaf94b
* 18:48 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2`
* 15:13 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Hovercards: Disable test release on Catalan and Greek Wikipedias [[gerrit:215932]] (duration: 00m 13s)
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1001.wikimedia.org
* 15:06 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150618 [[gerrit:218886]] (duration: 00m 14s)
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:14 akosiaris: powercycling labstore2001
* 17:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:08 moritzm: added firejail_0.9.26-1~wmfjessie1 and firejail_0.9.26-1~wmftrusty1 to apt.wikimedia.org
* 17:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp1001.wikimedia.org
* 08:45 jynus: very brief replication stop for s7, already corrected
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2001.wikimedia.org
* 06:51 Coren: rebooting labstore2001
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:32 legoktm: live hacking mw1017 for T102915
* 17:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host elastic1049.eqiad.wmnet
* 05:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 18 05:26:01 UTC 2015 (duration 26m 0s)
* 17:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 02:48 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-18 02:48:44+00:00
* 17:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp2001.wikimedia.org
* 02:46 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 03s)
* 17:14 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 02:32 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-18 02:32:45+00:00
* 17:09 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1049.eqiad.wmnet
* 02:28 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 56s)
* 17:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 02:04 springle: applied T99941 scema change to all remaining affected (ie, old) wikis
* 17:01 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 02:01 tgr: ran https://gerrit.wikimedia.org/r/#/c/159350/7/backend/schema/mysql/developer_agreement.sql on mediawikiwiki
* 16:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
* 01:32 ejegg: updated payments from f33d0a8687a120a2057a7e6acad67da63b17f97e to a17ee221db0dbde70c92e24fc188379b6dbad613
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 01:20 logmsgbot: ori Synchronized php-1.26wmf10/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: 0c21a14a6e: Revert StashEdit: Use postWithToken (duration: 00m 13s)
* 16:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 01:06 twentyafterfour: applied hotfix for T102276 and restarted apache on iridium
* 16:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
* 00:00 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf10
* 15:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
* 15:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 15:55 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2048.codfw.wmnet
* 15:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
* 15:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
* 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 15:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 15:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (2/2) (duration: 03m 28s)
* 15:37 klausman: restarting pybal on lvs2009
* 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
* 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 15:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (1/2) (duration: 03m 51s)
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:30 klausman: Restarting pybal on lvs2010
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2001.codfw.wmnet
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging-ctrl2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:16 klausman@cumin1001: conftool action : help; selector: name=ml-staging2001
* 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:06 moritzm: installing avahi security updates
* 15:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:01 papaul: PDU swap for rack a2 complete
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:24 papaul: on going maintenance on ps1-a2-codfw
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 13:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
* 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
* 13:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
* 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
* 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:28 daniel@deploy1002: Synchronized rpc/: Config: [[gerrit:805775{{!}}rpc: Remove unused RunJobs.php (T175146 T243096)]] (duration: 03m 45s)
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
* 13:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 13:05 moritzm: installing Linux 5.10.120-1~bpo10+1 on buster hosts with backports kernel
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 13:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
* 12:56 moritzm: installing haproxy security updates on stretch
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 12:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 12:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
* 12:43 moritzm: installing python-bottle security updates
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 12:25 moritzm: reset logster-csp/logster-badpass-priv on mwlog1002, these were removed from Puppet
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:59 mbsantos: mbsantos@maps2009 imposm-removebackup-import ([[phab:T305845|T305845]])
* 11:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:43 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for testing', diff saved to https://phabricator.wikimedia.org/P29936 and previous config saved to /var/cache/conftool/dbconfig/20220621-114232-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for testing', diff saved to https://phabricator.wikimedia.org/P29935 and previous config saved to /var/cache/conftool/dbconfig/20220621-114216-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for testing', diff saved to https://phabricator.wikimedia.org/P29934 and previous config saved to /var/cache/conftool/dbconfig/20220621-114151-root.json
* 10:57 volans: deleting netbox getstats.GetDeviceStats job results - [[phab:T311048|T311048]]
* 10:51 kart_: Updated cxserver to 2022-06-21-035954-production ([[phab:T307970|T307970]])
* 10:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 10:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 10:47 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 10:47 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 10:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 10:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:31 urbanecm: 09:29:23 Synchronized wmf-config/throttle.php: {{Gerrit|7c9f6a561b2b4b5c5db063bad83bd23e9cbac347}}: Add a throttle rule for a Czech course ([[phab:T310885|T310885]]) (duration: 03m 34s) #manually logging in logmsgbot's absence
* 09:20 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 09:13 marostegui: dbmaint s8@codfw [[phab:T310011|T310011]]
* 08:29 marostegui: Reboot db1120 for kernel upgrade
* 08:14 moritzm: remove EOLed parsoid debs from releases.wikimedia.org [[phab:T309765|T309765]]
* 05:54 marostegui: Reboot db1132 and db1181 for kernel upgrade


== June 17 ==
== 2022-06-20 ==
* 23:35 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: SWAT (duration: 00m 14s)
* 07:14 SandraEbele: Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics).
* 23:35 gwicke: rolled back restbase to 90817c2a
* 07:14 SandraEbele: killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs.
* 23:24 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/MobileFrontend: SWAT (duration: 00m 15s)
* 23:23 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/Flow: SWAT (duration: 00m 15s)
* 22:45 gwicke: rolling restart of cassandra nodes
* 22:09 gwicke: rolling restart of restbase instances to apply puppet change after puppet actually ran on all nodes
* 21:58 gwicke: rolling restart of restbase instances to apply config change
* 21:56 godog: restart nutcracker on mw1145
* 21:35 gwicke: restarting cassandra on restbase1005
* 20:47 mutante: temp. stopped icinga-wm
* 20:37 gwicke: deployed RESTBase 7ffaf94bfc
* 20:24 cscott: updated Parsoid to version 402ddf66
* 20:01 ottomata: resized antimony's / LV from 30G to 100G.  looks like /var/lib/git was getting filled up
* 19:43 jynus: rolling schema changes on hewiki
* 19:29 godog: downgrade and restart cassandra to 2.1.3 on restbase1001, metrics not being pushed to graphite with 2.1.6
* 19:05 godog: bounce cassandra on xenon
* 18:46 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ic03b152de: Make $wgUploadPath for commons https only for benefit instant commons (duration: 00m 14s)
* 18:11 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf10
* 17:45 godog: bounce cassandra on restbase1001
* 17:39 mutante: repooled mw1234
* 17:24 ottomata: starting reinstall of Zookeeper analytics nodes (analytics102[345]): https://phabricator.wikimedia.org/T101713
* 17:16 godog: bounce cassandra on restbase1001
* 17:14 jynus: rolling schema changes on ruwiki master
* 17:13 mutante: running puppet via salt on api appservers in batches, switch to ganglia_new and carbon
* 17:12 godog: cassandra stopped sending graphite metrics after restart, investigating (test cluster works fine tho)
* 16:58 jynus: rolling schema changes on ruwiki slaves
* 16:28 godog: start upgrading restbase1001 to cassandra 2.1.6 T102015
* 16:02 logmsgbot: thcipriani Finished scap: Wikitech-Ldap host record roll-out (duration: 24m 35s)
* 15:37 logmsgbot: thcipriani Started scap: Wikitech-Ldap host record roll-out
* 15:19 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Give patrolmarks right to "*" on dewiki [[gerrit:218901]] (duration: 00m 13s)
* 15:17 logmsgbot: anomie Synchronized wmf-config/throttle.php: SWAT: Add a throttle exception for United Islands of Prague [[gerrit:217413]] (duration: 00m 14s)
* 15:15 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable captcha on labswiki for now [[gerrit:218908]] (duration: 00m 13s)
* 15:10 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add extra namespace aliases for Italian Wikipedia [[gerrit:215708]] (duration: 00m 13s)
* 15:08 anomie: SWAT: Enable anti-abuse features on labswiki [[gerrit:218903]]
* 15:08 jynus: testing some schema changes on testwiki
* 15:00 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on nowiki and plwiki (duration: 00m 13s)
* 13:56 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on fiwiki and idwiki (duration: 00m 13s)
* 13:26 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on bgwiki and eowiki (duration: 00m 13s)
* 10:52 akosiaris: reload pybal on lvs1006
* 10:50 mobrovac: finished deploying mathoid I40ef68 on SCA
* 10:48 akosiaris: repooled mathoid.svc.eqiad.wmnet: sca1002 backend
* 10:44 akosiaris: enable puppet on sca1002
* 10:43 akosiaris: enable puppet
* 10:43 akosiaris: depool sca1002 for mathoid.svc.eqiad.wmnet
* 10:43 akosiaris: reloaded pybal on lvs1003
* 10:28 akosiaris: repool sca1002, depool sca1001
* 10:18 mark: Halting pvmove of md124 on labstore1001
* 09:30 akosiaris: disable puppet on sca1001
* 09:09 akosiaris: depool sca1001, resource: mathoid
* 09:09 akosiaris: puppet disabled on sca1002
* 08:37 YuviPanda: run sudo salt -t 20 -b 100 '*' cmd.run 'sudo service salt-minion restart' on virt1000, attempt to get them to answer on labcontrol1001 instead
* 06:52 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 17 06:52:58 UTC 2015 (duration 52m 57s)
* 02:56 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-17 02:56:49+00:00
* 02:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1045 (duration: 00m 13s)
* 02:54 springle: found wikiversions.json modified on tin since 2015-06-16 23:27 (catrope?); stashed and reapplied the file in order to do a pull
* 02:54 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 04m 44s)
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-17 02:35:23+00:00
* 02:32 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 06m 12s)
* 02:21 logmsgbot: ori Synchronized php-1.26wmf9/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 02:21 logmsgbot: ori Synchronized php-1.26wmf10/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 00:10 paravoid: draining esams because of upcoming network maintenance window


== June 16 ==
== 2022-06-19 ==
* 23:28 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable local upload on fawikivoyage; enable logging for T76305 (duration: 00m 13s)
* 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:28 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: Set previous values for password length policies (duration: 00m 16s)
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:17 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf10 (duration: 43m 04s)
* 10:14 ayounsi@cumin1001: dbctl commit (dc=all): 'depool', diff saved to https://phabricator.wikimedia.org/P29910 and previous config saved to /var/cache/conftool/dbconfig/20220619-101436-ayounsi.json
* 23:02 godog: restore INFO cassandra logging level on restbase1003
* 22:44 godog: start cassandra on restbase1008
* 22:43 godog: enable back some cassandra debugging on restbase1003
* 22:33 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 22:26 urandom: restored default logging level on restbase1003
* 22:22 urandom: enabling even more debugging on restbase1003
* 22:14 urandom: enable (some) debug logging on restbase1003
* 21:57 logmsgbot: twentyafterfour scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.SxGNHsmVYP" ' returned non-zero exit status 1 (duration: 01m 24s)
* 21:56 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 20:34 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents/modules/ext.wikimediaEvents.resourceloader.js: T101806 live hack (duration: 00m 12s)
* 19:24 Coren: labstore1001 pvmove of slice2 to slice 51 started; some bursts of iowait expected but should have minimal enduser impact)
* 18:36 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Fix usage tracking setting (duration: 00m 14s)
* 18:03 godog: bounce statsite on graphite1001, stuck while writing to graphite
* 17:30 ejegg: update SmashPig on listener from e1e925c9fc2a60c1e14ef01d8b653dc09512f51f to 258f2c917b1ae50b01231927bcd6f58ecaa8940b
* 17:23 logmsgbot: krinkle Synchronized php-1.26wmf9/includes/resourceloader/ResourceLoader.php: undo live hack (duration: 00m 13s)
* 17:09 logmsgbot: aude Synchronized arbitraryaccess.dblist: Enable arbitrary access on gomwiki and lrcwiki (duration: 00m 13s)
* 17:09 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on second batch of s3 wikis (duration: 00m 13s)
* 17:03 logmsgbot: bblack Synchronized wmf-config/InitialiseSettings.php: wgCanonicalServer: HTTPS for all (duration: 00m 15s)
* 16:44 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 16:43 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
* 16:43 logmsgbot: krenair Synchronized w/static/images/project-logos/gomwiki.png: (no message) (duration: 00m 14s)
* 16:42 logmsgbot: krenair Synchronized langlist: gomwiki (duration: 00m 13s)
* 16:41 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 16:40 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 13s)
* 16:29 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 16:27 logmsgbot: krenair Synchronized langlist: (no message) (duration: 00m 14s)
* 16:25 logmsgbot: krenair Synchronized w/static/images/project-logos/lrcwiki.png: (no message) (duration: 00m 13s)
* 16:21 moritzm: updated copper, oxygen, labstore2001 and labnodepool1001 to the 3.19 kernel
* 16:11 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 16:10 logmsgbot: krenair Synchronized wmf-config: (no message) (duration: 00m 14s)
* 16:06 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 16:05 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 15s)
* 15:43 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: templateeditor: add templateeditor right in hewiki [[gerrit:218426]] (duration: 00m 13s)
* 15:09 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Turn on wgGenerateThumbnailOnParse for wikitech. [[gerrit:218553]] (duration: 00m 12s)
* 15:03 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for CX deployment on 20150616 [[gerrit:218341]] (duration: 00m 12s)
* 14:18 cmjohnson: barium is going down for disk replacement
* 13:38 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on dewiki (duration: 00m 15s)
* 13:18 akosiaris: rebooted etherpad1001 for kernel upgrades
* 12:51 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2005, es2006 and es2007 after maintenance (duration: 00m 13s)
* 12:44 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on cswiki (duration: 00m 14s)
* 12:20 logmsgbot: aude Synchronized usagetracking.dblist: Enable usage tracking on ruwiki (duration: 00m 15s)
* 11:21 paravoid: restarting the puppetmaster
* 11:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 00m 13s)
* 10:36 akosiaris: rebooting ganeti200{1..6}.codfw.wmnet for kernel upgrades
* 09:33 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2005, es2006 and es2007 for maintenance (duration: 00m 14s)
* 09:10 YuviPanda: deleted huge puppet-master.log on labcontrol1001
* 08:05 jynus: added m5-slave to dns servers
* 07:52 paravoid: restarting hhvm on mw1121
* 07:52 moritzm: blacklisted the overlayfs kernel module (prevents a reliable local root exploit on all Ubuntu systems). no systems in the fleet had an overlaysfs mount present or the kernel module loaded, so there should be no impact on existing systems. Note: This is a bandaid, I'll create a Phab task to deploy this via puppet in the future (and to also blacklist additional desktopy kernel modules which increase our attack
* 07:39 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1005 (duration: 00m 14s)
* 06:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 16 06:24:04 UTC 2015 (duration 24m 3s)
* 06:18 godog: restore ES replication throttling to 20mb/s
* 06:13 godog: restore ES replication throttling to 40mb/s
* 06:08 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: unthrottle ES (duration: 00m 14s)
* 05:56 godog: bump ES replication throttling to 60mb/s
* 05:50 manybubbles: ok - we're yellow and recovering. ops can take this from here. We have a root cause and we have things I can complain about to the elastic folks I plan to meet with today anyway. I'm going to finish waking up now.
* 05:49 manybubbles: reenabling puppet agent on elasticsearch machines
* 05:46 manybubbles: I expect them to be red for another few minutes during the initial master recovery
* 05:45 manybubbles: started all elasticsearch nodes and now they are recovering.
* 05:41 godog: restart gmond on elastic1007
* 05:39 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: throttle ES (duration: 00m 13s)
* 05:25 manybubbles: shutting down all the elasticsearch on the elasticsearch nodes against - another full cluster restart should fix it like it did last time...............
* 05:11 godog: restart elasticsearch on elastic1031
* 03:06 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 12s)
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-16 02:27:51+00:00
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 52s)
* 00:55 tgr: running extensions/Gather/maintenance/updateCounts.php for gather wikis - https://phabricator.wikimedia.org/T101460
* 00:52 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 00m 13s)
* 00:46 godog: killed bacula-fd on graphite1001, shouldn't be running and consuming bandwidth (cc akosiaris)
* 00:27 godog: kill python stats on cp1052, filling /tmp


== June 15 ==
== 2022-06-17 ==
* 23:42 ori: Cleaning up renamed jobqueue metrics on graphite{1,2}001
* 22:05 AndyRussG: update payments-wiki revision {{Gerrit|10304f69}} -> {{Gerrit|ef53c82e}}
* 23:01 godog: killed bacula-fd on graphite2001, shouldn't be running and consuming bandwidth (cc akosiaris)
* 20:22 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P29908 and previous config saved to /var/cache/conftool/dbconfig/20220617-202240-jynus.json
* 22:54 logmsgbot: hoo Synchronized wmf-config/filebackend.php: Fix commons image inclusion after commons went https only (duration: 00m 14s)
* 20:20 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P29907 and previous config saved to /var/cache/conftool/dbconfig/20220617-202038-jynus.json
* 22:18 godog: run disk stress-test on restbase1007 / restbase1009
* 17:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS buster
* 22:06 logmsgbot: twentyafterfour Synchronized hhvm-fatal-error.php: deploy: Guard header() call in error page (duration: 00m 15s)
* 17:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 22:05 logmsgbot: twentyafterfour Synchronized wmf-config/InitialiseSettings-labs.php: deploy: Never use wgServer/wgCanonicalServer values from production in labs (duration: 00m 12s)
* 17:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 20:37 logmsgbot: yurik Synchronized docroot/bits/WikipediaMobileFirefoxOS: Bumping FirefoxOS app to latest (duration: 00m 14s)
* 16:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS buster
* 20:30 godog: bounce cassandra on restbase1003
* 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 20:18 godog: start cassandra on restbase1008, bootstrapping
* 16:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS buster
* 20:04 godog: sign restbase1008 key, run puppet
* 16:37 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 20:00 godog: powercycle restbase1007, investigate disk issue
* 16:35 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 19:07 logmsgbot: ori Synchronized php-1.26wmf9/includes/jobqueue: 0a32aa3be4: jobqueue: use more sensible metric key names (duration: 00m 13s)
* 16:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 16:57 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 14s)
* 16:34 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 16:48 logmsgbot: thcipriani Synchronized php-1.26wmf9/extensions/OpenStackManager/OpenStackManagerHooks.php: SWAT: refer to user the right way (duration: 00m 13s)
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 16:48 godog: powercycle graphite1002, no ssh, unresponsive console
* 16:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 16:19 jynus: upgrading es1005 mysql service while depooled
* 16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 16:12 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s)
* 16:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 16:10 bblack: pybal restarts complete, all ok
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 16:09 logmsgbot: thcipriani Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s)
* 16:22 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 15:47 logmsgbot: thcipriani Started scap: SWAT: Openstack manager and language updates
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 15:46 bblack: starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness
* 16:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 15:11 bblack: rebooting cp3041 (downtimed)
* 16:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 15:00 _joe_: ES is green
* 16:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 14:38 logmsgbot: aude Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s)
* 16:06 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 14:27 logmsgbot: aude Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s)
* 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
* 13:47 jynus: enabling puppet on all elastic* nodes, should enable also ganglia
* 16:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 13:11 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-common.php: all the search (duration: 00m 12s)
* 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 13:04 _joe_: re-scaling down the recovery index bandwidth in ES to 20 mb/s
* 15:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 12:52 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-common.php: partially turn search back on (duration: 00m 13s)
* 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 11:54 _joe_: raised the ES index replica bandwidth limit to 60mb
* 15:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 11:31 akosiaris: migrating etherpad.wikimedia.org to etherpad1001.eqiad.wmnet
* 15:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 11:15 _joe_: raised the max bytes for ES recovery to 40mbps
* 15:46 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS buster
* 10:49 manybubbles: and we're yellow right now.
* 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 10:49 manybubbles: the initial primaries stage - the red stage of the rolling restart - recovers quick-ish
* 15:39 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 10:48 manybubbles: soon we should see it go yellow and stay that way while the replicas recover
* 15:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 10:48 manybubbles: manybubbles is confident his mighty bitch slap of the elasticsearch cluster has set it further to the road to recovery
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 10:46 jynus: disabled puppet on all elasticsearch nodes to avoid restarting services and other magic
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 10:44 _joe_: disabled hot threads logging, ganglia on es nodes
* 15:32 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 10:44 manybubbles: started Elasticsearch on all elasticsearch nodes
* 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS buster
* 10:38 manybubbles: stopping all elasticsearch servers - going for a full cluster resstart.
* 15:29 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 10:11 manybubbles: restarting elasticsearch on elasticsearch1021 - that one is in a gc death spiral
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 09:26 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)
* 15:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 09:12 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)
* 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 07:35 _joe_: attempting a fast restart of elastic1020
* 15:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 07:21 logmsgbot: ori Synchronized php-1.26wmf9/extensions/CirrusSearch/includes/Util.php: I504dac0c3: Add missing 'use \Status;' to includes/Util.php (duration: 00m 13s)
* 15:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 04:56 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 15 04:56:39 UTC 2015 (duration 56m 38s)
* 15:18 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1057 (duration: 00m 12s)
* 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-15 02:22:56+00:00
* 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 02:19 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 46s)
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:15 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS buster
* 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 15:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 15:02 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 14:59 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 14:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 14:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 12:35 SandraEbele: deployed daily airflow dag for 3 Wikidata metrics.
* 11:54 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@18182aa]: (no justification provided) (duration: 00m 13s)
* 11:54 ebysans@deploy1002: Started deploy [airflow-dags/analytics@18182aa]: (no justification provided)
* 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2012.codfw.wmnet
* 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2012.codfw.wmnet
* 11:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2011.codfw.wmnet
* 11:40 moritzm: upload cas 6.5.5+wmf11u1 to apt.wikimedia.org [[phab:T305518|T305518]]
* 11:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2011.codfw.wmnet
* 11:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2010.codfw.wmnet
* 11:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 11:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 11:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 11:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 11:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2010.codfw.wmnet
* 11:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
* 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
* 11:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1010.eqiad.wmnet
* 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 10:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 10:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 10:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 10:34 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
* 09:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
* 09:56 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 09:56 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 09:55 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 09:55 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 09:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
* 09:44 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
* 09:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
* 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1004.eqiad.wmnet
* 09:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
* 09:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
* 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
* 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
* 09:11 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 09:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 08:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 08:51 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 08:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 08:39 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 08:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
* 08:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
* 08:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
* 08:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
* 07:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 07:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 02:51 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
* 02:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:36 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 03m 43s)
* 02:02 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
* 01:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
* 01:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
* 00:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 00:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye


== June 14 ==
== 2022-06-16 ==
* 10:39 YuviPanda: running du -d 2 on /srv/project in a screen sesssion on labstore1001
* 23:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 04:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 14 04:33:20 UTC 2015 (duration 33m 19s)
* 23:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 02:42 logmsgbot: reedy Synchronized wmf-config/extension-list: noop (duration: 00m 13s)
* 23:38 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 02:40 logmsgbot: krenair Synchronized wmf-config/squid-labs.php: sync random labs-only file to test per irc (duration: 00m 13s)
* 23:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-14 02:21:28+00:00
* 22:59 mutante: new Wikipedia languages added to DNS:  blk = https://en.wikipedia.org/wiki/Pa%27O_language  {{!}}  pcm = https://en.wikipedia.org/wiki/Nigerian_Pidgin
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 47s)
* 22:37 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:33 volans@cumin2002: START - Cookbook sre.dns.netbox
* 21:18 thcipriani@deploy1002: Finished scap: noop test (duration: 04m 07s)
* 21:14 thcipriani@deploy1002: Started scap: noop test
* 21:10 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:805433{{!}}CommonSettings: clean up and simplify some code]] (duration: 03m 42s)
* 21:06 thcipriani@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:806249{{!}}MWRealm.php: remove unused getRealmSpecificFilename() (T171115)]] (duration: 03m 35s)
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 thcipriani@deploy1002: Finished scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]] (duration: 16m 57s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:42 thcipriani@deploy1002: Started scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]]
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 cjming@deploy1002: Synchronized phpcs.xml: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 27s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 cjming@deploy1002: Synchronized wmf-config/: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 22s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805179{{!}}Turn off TOC A/B test for pilot wikis (T309683)]] (duration: 03m 37s)
* 19:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner2001.codfw.wmnet
* 19:39 aokoth@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:23 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 19:03 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner2001.codfw.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab-runner1001.eqiad.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29904 and previous config saved to /var/cache/conftool/dbconfig/20220616-185520-marostegui.json
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:54 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:50 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:49 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:44 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to all wikis
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:42 brennen@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/CheckUser/src/Hooks.php: Backport: [[gerrit:806246{{!}}Only try to create User object if username is not null (T310747)]] (duration: 03m 23s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29903 and previous config saved to /var/cache/conftool/dbconfig/20220616-184015-marostegui.json
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29902 and previous config saved to /var/cache/conftool/dbconfig/20220616-182510-marostegui.json
* 18:13 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:11 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29901 and previous config saved to /var/cache/conftool/dbconfig/20220616-181005-marostegui.json
* 18:10 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 17:59 brennen: end of phabricator deploy
* 17:46 brennen: starting phabricator deploy, momentary downtime expected while services restart
* 17:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29900 and previous config saved to /var/cache/conftool/dbconfig/20220616-173738-marostegui.json
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29899 and previous config saved to /var/cache/conftool/dbconfig/20220616-173725-marostegui.json
* 17:31 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:31 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:27 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:26 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29898 and previous config saved to /var/cache/conftool/dbconfig/20220616-172220-marostegui.json
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29897 and previous config saved to /var/cache/conftool/dbconfig/20220616-170715-marostegui.json
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29896 and previous config saved to /var/cache/conftool/dbconfig/20220616-165210-marostegui.json
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29895 and previous config saved to /var/cache/conftool/dbconfig/20220616-161844-marostegui.json
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29894 and previous config saved to /var/cache/conftool/dbconfig/20220616-161835-marostegui.json
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29893 and previous config saved to /var/cache/conftool/dbconfig/20220616-160330-marostegui.json
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29892 and previous config saved to /var/cache/conftool/dbconfig/20220616-154825-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29891 and previous config saved to /var/cache/conftool/dbconfig/20220616-153320-marostegui.json
* 15:31 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 15:30 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 15:29 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P29890 and previous config saved to /var/cache/conftool/dbconfig/20220616-151434-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P29889 and previous config saved to /var/cache/conftool/dbconfig/20220616-145931-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29888 and previous config saved to /var/cache/conftool/dbconfig/20220616-145136-marostegui.json
* 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29887 and previous config saved to /var/cache/conftool/dbconfig/20220616-145128-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P29886 and previous config saved to /var/cache/conftool/dbconfig/20220616-144427-ladsgroup.json
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29885 and previous config saved to /var/cache/conftool/dbconfig/20220616-143623-marostegui.json
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P29884 and previous config saved to /var/cache/conftool/dbconfig/20220616-142923-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29883 and previous config saved to /var/cache/conftool/dbconfig/20220616-142118-marostegui.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29882 and previous config saved to /var/cache/conftool/dbconfig/20220616-140613-marostegui.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29881 and previous config saved to /var/cache/conftool/dbconfig/20220616-140453-root.json
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 volans@cumin1001: dbctl commit (dc=all): 'Doesn't have new wikiuser', diff saved to https://phabricator.wikimedia.org/P29880 and previous config saved to /var/cache/conftool/dbconfig/20220616-140107-volans.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29879 and previous config saved to /var/cache/conftool/dbconfig/20220616-134950-root.json
* 13:45 sukhe: upload bird2_2.0.7-4.1wm1 to apt.wm.o (buster) - [[phab:T310574|T310574]]
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29878 and previous config saved to /var/cache/conftool/dbconfig/20220616-133446-root.json
* 13:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1089.eqiad.wmnet
* 13:22 jayme@cumin1001: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 13:21 jayme@cumin1001: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29877 and previous config saved to /var/cache/conftool/dbconfig/20220616-131942-root.json
* 13:10 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29876 and previous config saved to /var/cache/conftool/dbconfig/20220616-130438-root.json
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 13:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29875 and previous config saved to /var/cache/conftool/dbconfig/20220616-123357-marostegui.json
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1008.eqiad.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 for schema change', diff saved to https://phabricator.wikimedia.org/P29874 and previous config saved to /var/cache/conftool/dbconfig/20220616-115924-root.json
* 11:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1008.eqiad.wmnet
* 11:53 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet
* 11:45 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet
* 11:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
* 11:38 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
* 11:35 godog: trim swift logs older than 25d from centrallog hosts - [[phab:T309171|T309171]]
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet
* 11:27 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet
* 11:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
* 11:19 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29873 and previous config saved to /var/cache/conftool/dbconfig/20220616-111632-marostegui.json
* 11:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
* 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 11:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json
* 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
* 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
* 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
* 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json
* 10:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster
* 10:02 elukey: ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster
* 09:21 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json
* 09:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:02 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 08:45 moritzm: failover ganeti master in drmrs/2 to ganeti6004
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370{{!}}testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 joal: Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure


== June 13 ==
== 2022-06-15 ==
* 19:30 bblack: repooled cp1071, cp3040
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json
* 18:53 bblack: rebooting cp1071, cp3040 to look at BIOS-level things (depooled, icinga-downed)
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json
* 17:08 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents: T101806 (duration: 00m 12s)
* 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS buster
* 15:47 paravoid: labstore1001: stopping manage-nfs-volumes daemon
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29865 and previous config saved to /var/cache/conftool/dbconfig/20220615-221834-marostegui.json
* 04:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 13 04:41:57 UTC 2015 (duration 41m 56s)
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS buster
* 03:51 Krinkle: Running deleteEqualMessages.php for sawiki (T45917)
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 03:49 Krinkle: Running deleteEqualMessages.php for cewiki (T45917)
* 22:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-13 02:20:58+00:00
* 22:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 19s)
* 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 00:17 gwicke: restarted cassandra on restbase1001
* 22:12 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 00:13 gwicke: restarted cassandra on restbase1002
* 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29864 and previous config saved to /var/cache/conftool/dbconfig/20220615-220329-marostegui.json
* 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29863 and previous config saved to /var/cache/conftool/dbconfig/20220615-213241-marostegui.json
* 21:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 21:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29862 and previous config saved to /var/cache/conftool/dbconfig/20220615-213233-marostegui.json
* 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29861 and previous config saved to /var/cache/conftool/dbconfig/20220615-211728-marostegui.json
* 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29860 and previous config saved to /var/cache/conftool/dbconfig/20220615-210223-marostegui.json
* 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29859 and previous config saved to /var/cache/conftool/dbconfig/20220615-204717-marostegui.json
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804014{{!}}Remove unused setting wgQuickSurveysUseVue (T285890)]] (duration: 03m 38s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:50 hashar@deploy1002: Finished deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]] (duration: 00m 10s)
* 19:50 hashar@deploy1002: Started deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]]
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29858 and previous config saved to /var/cache/conftool/dbconfig/20220615-194703-marostegui.json
* 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29857 and previous config saved to /var/cache/conftool/dbconfig/20220615-194655-marostegui.json
* 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29856 and previous config saved to /var/cache/conftool/dbconfig/20220615-193150-marostegui.json
* 19:31 hashar: wikibugs IRC bot has been restarted by valhallasw \o/ # [[phab:T310734|T310734]]
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29855 and previous config saved to /var/cache/conftool/dbconfig/20220615-191645-marostegui.json
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29854 and previous config saved to /var/cache/conftool/dbconfig/20220615-190140-marostegui.json
* 18:42 hashar: wikibugs (irc bot for Phabricator/Gerrit) is no more working and would need a restart [[phab:T310734|T310734]]
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29853 and previous config saved to /var/cache/conftool/dbconfig/20220615-182140-marostegui.json
* 18:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]] (duration: 03m 43s)
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1015.eqiad.wmnet with OS buster
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:52 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1014.eqiad.wmnet with OS buster
* 17:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29851 and previous config saved to /var/cache/conftool/dbconfig/20220615-172738-marostegui.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29849 and previous config saved to /var/cache/conftool/dbconfig/20220615-171233-marostegui.json
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29848 and previous config saved to /var/cache/conftool/dbconfig/20220615-165727-marostegui.json
* 16:54 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to group0
* 16:44 jynus: reestarting replication for m3 on db1117, not db2078
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29847 and previous config saved to /var/cache/conftool/dbconfig/20220615-164222-marostegui.json
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:29 brennen: phabricator upgrade finished
* 16:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Id8cdb8aef70f6672}} (duration: 03m 41s)
* 16:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host backup1009.eqiad.wmnet
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host backup1009.eqiad.wmnet
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29845 and previous config saved to /var/cache/conftool/dbconfig/20220615-160838-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29844 and previous config saved to /var/cache/conftool/dbconfig/20220615-160830-marostegui.json
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS buster
* 15:56 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P29843 and previous config saved to /var/cache/conftool/dbconfig/20220615-155325-marostegui.json
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 15:40 mutante: phabricator upgrade in progress
* 15:39 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:39 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20220615-153820-marostegui.json
* 15:35 brennen: starting phabricator deploy, momentary downtime expected while Apache restarts and migrations run
* 15:34 jynus: stopping replication for m3 on db1117, db2078
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29841 and previous config saved to /var/cache/conftool/dbconfig/20220615-152315-marostegui.json
* 15:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ms-be1059.eqiad.wmnet with OS bullseye
* 15:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:20 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:06 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:05 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:05 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:03 mutante: phabricator maintenance about to start
* 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 14:59 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:52 jbond@cumin1001: END (ERROR) - Cookbook sre.pdus.uptime (exit_code=97)
* 14:51 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29840 and previous config saved to /var/cache/conftool/dbconfig/20220615-145028-marostegui.json
* 14:50 urandom: ALTER-ing replication for codfw (Cassandra) expansion -- [[phab:T307641|T307641]]
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29839 and previous config saved to /var/cache/conftool/dbconfig/20220615-145020-marostegui.json
* 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:49 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:46 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29838 and previous config saved to /var/cache/conftool/dbconfig/20220615-143515-marostegui.json
* 14:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:30 hnowlan@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T308670|T308670]] credentials to access the similar-users service (duration: 03m 32s)
* 14:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:22 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:21 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29836 and previous config saved to /var/cache/conftool/dbconfig/20220615-142010-marostegui.json
* 14:19 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 14:16 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS buster
* 14:10 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jnuche@deploy1002: Installation of scap version "4.9.4" completed for 558 hosts
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 14:08 jnuche@deploy1002: Installing scap version "4.9.4" for 558 hosts
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29834 and previous config saved to /var/cache/conftool/dbconfig/20220615-140505-marostegui.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:01 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:58 awight: EU afternoon backport window complete.
* 13:57 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/Translate/src/PageTranslation/DeleteTranslatableBundleSpecialPage.php: Backport: [[gerrit:805749{{!}}Fix deletion of translation pages outside of NS_MAIN namespace (T310440)]] (duration: 00m 32s)
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29833 and previous config saved to /var/cache/conftool/dbconfig/20220615-135508-root.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29832 and previous config saved to /var/cache/conftool/dbconfig/20220615-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29831 and previous config saved to /var/cache/conftool/dbconfig/20220615-135458-root.json
* 13:54 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:53 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:49 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29830 and previous config saved to /var/cache/conftool/dbconfig/20220615-134004-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29829 and previous config saved to /var/cache/conftool/dbconfig/20220615-133958-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29828 and previous config saved to /var/cache/conftool/dbconfig/20220615-133954-root.json
* 13:38 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:805745{{!}}Restore internal mechanism to use either back or close button (T310602)]] (duration: 00m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29827 and previous config saved to /var/cache/conftool/dbconfig/20220615-133334-marostegui.json
* 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29826 and previous config saved to /var/cache/conftool/dbconfig/20220615-133326-marostegui.json
* 13:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 01m 08s)
* 13:30 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 02m 06s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29825 and previous config saved to /var/cache/conftool/dbconfig/20220615-132500-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29824 and previous config saved to /var/cache/conftool/dbconfig/20220615-132454-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29823 and previous config saved to /var/cache/conftool/dbconfig/20220615-132450-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29822 and previous config saved to /var/cache/conftool/dbconfig/20220615-131820-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29821 and previous config saved to /var/cache/conftool/dbconfig/20220615-130956-root.json
* 13:09 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29820 and previous config saved to /var/cache/conftool/dbconfig/20220615-130951-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29819 and previous config saved to /var/cache/conftool/dbconfig/20220615-130946-root.json
* 13:08 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:04 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 43s)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29818 and previous config saved to /var/cache/conftool/dbconfig/20220615-130315-marostegui.json
* 13:02 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 12:56 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 58s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:55 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 05s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29817 and previous config saved to /var/cache/conftool/dbconfig/20220615-125452-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29816 and previous config saved to /var/cache/conftool/dbconfig/20220615-125447-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29815 and previous config saved to /var/cache/conftool/dbconfig/20220615-125442-root.json
* 12:51 jbond@deploy1002: Finished deploy [netbox/deploy@7bbf659]: log (duration: 03m 12s)
* 12:48 jbond@deploy1002: Started deploy [netbox/deploy@7bbf659]: log
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29813 and previous config saved to /var/cache/conftool/dbconfig/20220615-124810-marostegui.json
* 12:42 moritzm: failover ganeti master in eqsin to ganeti5001
* 12:42 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:42 volans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29812 and previous config saved to /var/cache/conftool/dbconfig/20220615-123949-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29811 and previous config saved to /var/cache/conftool/dbconfig/20220615-123943-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29810 and previous config saved to /var/cache/conftool/dbconfig/20220615-123938-root.json
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 12:25 kart_: Updated cxserver to 2022-06-15-074244-production ([[phab:T309266|T309266]], [[phab:T310116|T310116]], [[phab:T309384|T309384]], [[phab:T306963|T306963]])
* 12:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 12:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 es1033 es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29808 and previous config saved to /var/cache/conftool/dbconfig/20220615-122123-root.json
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 12:19 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29807 and previous config saved to /var/cache/conftool/dbconfig/20220615-121620-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29806 and previous config saved to /var/cache/conftool/dbconfig/20220615-121440-marostegui.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 12:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29805 and previous config saved to /var/cache/conftool/dbconfig/20220615-115935-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29804 and previous config saved to /var/cache/conftool/dbconfig/20220615-115452-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29803 and previous config saved to /var/cache/conftool/dbconfig/20220615-115135-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29802 and previous config saved to /var/cache/conftool/dbconfig/20220615-115127-root.json
* 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29801 and previous config saved to /var/cache/conftool/dbconfig/20220615-114950-marostegui.json
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29800 and previous config saved to /var/cache/conftool/dbconfig/20220615-114430-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29799 and previous config saved to /var/cache/conftool/dbconfig/20220615-113948-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29798 and previous config saved to /var/cache/conftool/dbconfig/20220615-113631-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29797 and previous config saved to /var/cache/conftool/dbconfig/20220615-113623-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29796 and previous config saved to /var/cache/conftool/dbconfig/20220615-113445-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29795 and previous config saved to /var/cache/conftool/dbconfig/20220615-112924-marostegui.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29794 and previous config saved to /var/cache/conftool/dbconfig/20220615-112444-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29793 and previous config saved to /var/cache/conftool/dbconfig/20220615-112127-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29792 and previous config saved to /var/cache/conftool/dbconfig/20220615-112119-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29791 and previous config saved to /var/cache/conftool/dbconfig/20220615-111940-marostegui.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29790 and previous config saved to /var/cache/conftool/dbconfig/20220615-110940-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29789 and previous config saved to /var/cache/conftool/dbconfig/20220615-110623-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29788 and previous config saved to /var/cache/conftool/dbconfig/20220615-110616-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29787 and previous config saved to /var/cache/conftool/dbconfig/20220615-110435-marostegui.json
* 10:55 marostegui: dbmaint es3@eqiad [[phab:T310485|T310485]]
* 10:55 marostegui: dbmaint es2@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui: dbmaint es1@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29786 and previous config saved to /var/cache/conftool/dbconfig/20220615-105437-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29784 and previous config saved to /var/cache/conftool/dbconfig/20220615-105119-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29783 and previous config saved to /var/cache/conftool/dbconfig/20220615-105112-root.json
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29782 and previous config saved to /var/cache/conftool/dbconfig/20220615-103933-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29781 and previous config saved to /var/cache/conftool/dbconfig/20220615-103615-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29780 and previous config saved to /var/cache/conftool/dbconfig/20220615-103608-root.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29779 and previous config saved to /var/cache/conftool/dbconfig/20220615-103101-marostegui.json
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29778 and previous config saved to /var/cache/conftool/dbconfig/20220615-103048-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1028 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29777 and previous config saved to /var/cache/conftool/dbconfig/20220615-102929-root.json
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29776 and previous config saved to /var/cache/conftool/dbconfig/20220615-101543-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29775 and previous config saved to /var/cache/conftool/dbconfig/20220615-100235-marostegui.json
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29774 and previous config saved to /var/cache/conftool/dbconfig/20220615-100037-marostegui.json
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29773 and previous config saved to /var/cache/conftool/dbconfig/20220615-094532-marostegui.json
* 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29772 and previous config saved to /var/cache/conftool/dbconfig/20220615-092706-marostegui.json
* 09:20 marostegui: Reboot sanitarium hosts (db1154, db1155) wiki replicas will have lag
* 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1059.eqiad.wmnet with OS bullseye
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29771 and previous config saved to /var/cache/conftool/dbconfig/20220615-091257-marostegui.json
* 09:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29770 and previous config saved to /var/cache/conftool/dbconfig/20220615-091249-marostegui.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29769 and previous config saved to /var/cache/conftool/dbconfig/20220615-091201-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29768 and previous config saved to /var/cache/conftool/dbconfig/20220615-085744-marostegui.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29767 and previous config saved to /var/cache/conftool/dbconfig/20220615-085656-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29766 and previous config saved to /var/cache/conftool/dbconfig/20220615-084239-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29765 and previous config saved to /var/cache/conftool/dbconfig/20220615-084151-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29764 and previous config saved to /var/cache/conftool/dbconfig/20220615-084046-marostegui.json
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29763 and previous config saved to /var/cache/conftool/dbconfig/20220615-083554-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29762 and previous config saved to /var/cache/conftool/dbconfig/20220615-082734-marostegui.json
* 08:23 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:22 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29761 and previous config saved to /var/cache/conftool/dbconfig/20220615-082050-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29760 and previous config saved to /var/cache/conftool/dbconfig/20220615-081744-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29759 and previous config saved to /var/cache/conftool/dbconfig/20220615-080546-root.json
* 08:03 XioNoX: re-enable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29758 and previous config saved to /var/cache/conftool/dbconfig/20220615-080240-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29757 and previous config saved to /var/cache/conftool/dbconfig/20220615-075042-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29756 and previous config saved to /var/cache/conftool/dbconfig/20220615-075024-marostegui.json
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29755 and previous config saved to /var/cache/conftool/dbconfig/20220615-074736-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29754 and previous config saved to /var/cache/conftool/dbconfig/20220615-073538-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29753 and previous config saved to /var/cache/conftool/dbconfig/20220615-073232-root.json
* 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29752 and previous config saved to /var/cache/conftool/dbconfig/20220615-072352-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P29751 and previous config saved to /var/cache/conftool/dbconfig/20220615-072034-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29750 and previous config saved to /var/cache/conftool/dbconfig/20220615-071728-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29749 and previous config saved to /var/cache/conftool/dbconfig/20220615-070847-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29748 and previous config saved to /var/cache/conftool/dbconfig/20220615-065342-marostegui.json
* 06:52 XioNoX: disable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29747 and previous config saved to /var/cache/conftool/dbconfig/20220615-063837-marostegui.json
* 06:02 marostegui: Reboot db[2071-2078] [[phab:T310485|T310485]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29746 and previous config saved to /var/cache/conftool/dbconfig/20220615-060153-marostegui.json
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29745 and previous config saved to /var/cache/conftool/dbconfig/20220615-054252-marostegui.json
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1173.eqiad.wmnet with OS bullseye
* 05:17 marostegui: dbmaint es5@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es4@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es3@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es2@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es1@codfw [[phab:T310485|T310485]]
* 05:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:03 marostegui: Reboot dbproxy1016 and dbproxy1021 [[phab:T310484|T310484]]
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 tstarling@deploy1002: Synchronized php-1.39.0-wmf.16/includes/cache/MessageCache.php: (no justification provided) (duration: 03m 36s)
* 02:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:17 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/includes/cache/MessageCache.php: [[phab:T310532|T310532]] (duration: 03m 29s)
* 02:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== June 12 ==
== 2022-06-14 ==
* 22:57 ejegg: rolled back SmashPig on listener from 15acdafef9d9682c417632e5ac5a5f2e5380f92e to e1e925c9fc2a60c1e14ef01d8b653dc09512f51f
* 23:52 mutante: gitlab-runner1001/1002 - clean revert not possible, icinga alerting about failed buildkitd service, manually deleting systemd unit and trying to clean up [[phab:T308271|T308271]]
* 22:40 ejegg: updated SmashPig on listener from e1e925c9fc2a60c1e14ef01d8b653dc09512f51f to 15acdafef9d9682c417632e5ac5a5f2e5380f92e
* 23:49 mutante: gitlab-runner1002 - systemctl restart docker; run-puppet-agent ; systemctl start buildkitd  - fails though [[phab:T308271|T308271]]
* 22:24 godog: upgrade and bounce carbon daemons on graphite2001 to investigate T101572
* 23:39 mutante: gitlab-runner1001 - systemctl start buildkitd
* 21:16 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I3694489ba: wgCanonicalServer->https for new HTTPS domains (duration: 00m 14s)
* 23:32 mutante: gitlab-runner1001 - restarting docker
* 20:33 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217878/1 (duration: 00m 13s)
* 23:08 mutante: disabling puppet in gitlab-runners (via cumin /disable-puppet) before deploying gerrit:791655 to provide gitlab-runners with buildkit and new docker network - [[phab:T308271|T308271]]
* 20:32 logmsgbot: krenair Synchronized w/static/images/project-logos/dawiki-200k.png: https://gerrit.wikimedia.org/r/#/c/217878/1 (duration: 00m 16s)
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217670/ (duration: 00m 12s)
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:28 ejegg: updated SmashPig on payments-listener from f9c3eaa99fa0fe8ef098d0fc876091d3676aa039 to 5a463400bc74706ba7bf6256cd0101014e792acb
* 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:28 ejegg: updated SmashPig on payments-listener ccepting New Patients:
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:47 ejegg: updated SmashPig on payments-listener from 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510 to f9c3eaa99fa0fe8ef098d0fc876091d3676aa039
* 22:15 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|e3fe6c04c95717f0f914bbfa366f5f827f392b6b}}: phpcs: fix more SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 39s)
* 18:45 logmsgbot: faidon Synchronized wmf-config/InitialiseSettings.php: remove wmgHTTPSBlacklistCountries (duration: 00m 12s)
* 22:05 urbanecm@deploy1002: Synchronized w/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 18s)
* 18:45 logmsgbot: faidon Synchronized wmf-config/CommonSettings.php: remove CanIPUseHTTPS hook (duration: 00m 13s)
* 22:02 urbanecm@deploy1002: Synchronized src/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 32s)
* 17:39 moritzm: updated cerium, xenon and praseodymium to 3.19 kernel
* 22:00 mutante: wtp1026 - manually running '/usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807'
* 17:08 ejegg: enabled queue consumer
* 21:58 urbanecm@deploy1002: Synchronized rpc/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 31s)
* 17:08 ejegg: updated crm from d13aaa4e9e937b0b1ae1f5de61ea7ff1f316d58f to bd8a00196071ddd04efbff7b30567dd9357c9000
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:53 ejegg: disabled donations queue consumer
* 21:54 urbanecm@deploy1002: Synchronized multiversion/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 29s)
* 15:52 logmsgbot: faidon Synchronized wmf-config/CommonSettings.php: hide prefershttps user pref (duration: 00m 13s)
* 21:54 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 15:40 logmsgbot: faidon Synchronized docroot/search.wikimedia.org/index.php: unbreak search.wikimedia.org due to HTTPS (duration: 00m 12s)
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:27 jynus: mysql load issues on labsdb1003, investigating
* 21:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 moritzm: updated etcd* to 3.19 kernel
* 21:51 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 38s)
* 12:11 jynus: restarting mariadb at labsdb1003
* 21:49 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
* 11:58 moritzm: updated rdb200* to 3.19 kernel
* 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:31 jynus: db2068 up but all services and console login unresponsive, powercycling
* 21:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
* 10:06 springle: killed a bunch of queries hammering labsdb1003 for days
* 21:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
* 09:58 moritzm: updated mc2004 to mc2016 to 3.19 kernel
* 21:38 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 06:06 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 12 06:06:55 UTC 2015 (duration 6m 54s)
* 21:32 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 04:37 logmsgbot: ori Synchronized php-1.26wmf9/extensions/FlaggedRevs: I4cfb47b41: Avoid post-redirect parse for certain edits (duration: 00m 14s)
* 21:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 02:40 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-12 02:40:36+00:00
* 21:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 10m 00s)
* 21:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 00:40 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217759 (duration: 00m 15s)
* 21:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 00:07 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 14s)
* 21:10 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 21:03 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 urbanecm@deploy1002: Synchronized docroot/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 41s)
* 20:37 urbanecm@deploy1002: Synchronized w/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 15s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 urbanecm@deploy1002: Synchronized multiversion/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 28s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 20:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 20:31 urbanecm@deploy1002: Synchronized wmf-config/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 38s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1021.eqiad.wmnet with OS buster
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS buster
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1018.eqiad.wmnet with OS buster
* 20:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1017.eqiad.wmnet with OS buster
* 19:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 19:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 19:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 19:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 19:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 19:32 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 19:32 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 19:16 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
* 19:10 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
* 18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 18:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:15 ayounsi@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=imagescaler-ro,name=codfw
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
* 17:57 brennen@deploy1002: Pruned MediaWiki: 1.39.0-wmf.14 (duration: 01m 53s)
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.16 (duration: 32m 52s)
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:25 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
* 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.16
* 17:13 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): train is blocked - will sync to testwikis and hold there for resolution of [[phab:T310532|T310532]]
* 16:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2053.codfw.wmnet with OS bullseye
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
* 16:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
* 16:12 jnuche@deploy1002: Installation of scap version "4.9.2" completed for 557 hosts
* 16:11 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:05 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:58 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2053.codfw.wmnet with OS bullseye
* 15:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host elastic2053.codfw.wmnet
* 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host elastic2053.codfw.wmnet
* 15:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
* 14:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:53 moritzm: failover ganeti master in ulsfo to ganeti4003
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|596058b5e4d906d40e620fe5b01f37c484f5a8c1}}: Add new throttle rule + remove expired one ([[phab:T310625|T310625]]) (duration: 03m 38s)
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:33 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
* 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
* 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS buster
* 14:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS buster
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS buster
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS buster
* 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS buster
* 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 14:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS buster
* 14:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS buster
* 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS buster
* 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29741 and previous config saved to /var/cache/conftool/dbconfig/20220614-132654-marostegui.json
* 13:13 urbanecm: UTC afternoon B&C window done
* 13:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1692de09bf04c724cf416679405d4b6485550d40}}: Disable DiscussionTools visualenhancements feature in production (duration: 03m 25s)
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29740 and previous config saved to /var/cache/conftool/dbconfig/20220614-131149-marostegui.json
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f2dc7296f0c25d00e45651c50c3e45733cc63b3}}: Make new topic tool available as opt-out almost everywhere (phrase 4; [[phab:T310392|T310392]]) (duration: 03m 45s)
* 13:06 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29739 and previous config saved to /var/cache/conftool/dbconfig/20220614-125644-marostegui.json
* 12:56 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS buster
* 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS buster
* 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS buster
* 12:42 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS buster
* 12:41 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS buster
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29738 and previous config saved to /var/cache/conftool/dbconfig/20220614-124139-marostegui.json
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS buster
* 12:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS buster
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS buster
* 12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS buster
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29737 and previous config saved to /var/cache/conftool/dbconfig/20220614-120323-marostegui.json
* 12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29735 and previous config saved to /var/cache/conftool/dbconfig/20220614-115020-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P29734 and previous config saved to /var/cache/conftool/dbconfig/20220614-113515-marostegui.json
* 11:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1173.eqiad.wmnet with OS bullseye
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29732 and previous config saved to /var/cache/conftool/dbconfig/20220614-110945-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29731 and previous config saved to /var/cache/conftool/dbconfig/20220614-110504-marostegui.json
* 11:02 moritzm: rebalancing ganeti cluster in esams [[phab:T308238|T308238]]
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
* 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29730 and previous config saved to /var/cache/conftool/dbconfig/20220614-105441-root.json
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 10:44 joal@deploy1002: Finished deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency (duration: 00m 09s)
* 10:44 joal@deploy1002: Started deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29729 and previous config saved to /var/cache/conftool/dbconfig/20220614-104021-marostegui.json
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29728 and previous config saved to /var/cache/conftool/dbconfig/20220614-103937-root.json
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:30 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
* 10:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29727 and previous config saved to /var/cache/conftool/dbconfig/20220614-102433-root.json
* 10:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
* 10:19 marostegui: dbmaint s6@eqiad [[phab:T60674|T60674]]
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29726 and previous config saved to /var/cache/conftool/dbconfig/20220614-101755-marostegui.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29725 and previous config saved to /var/cache/conftool/dbconfig/20220614-100930-root.json
* 10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS buster
* 10:03 moritzm: rename Ganeti group row_A in test cluster to row_A-test
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29724 and previous config saved to /var/cache/conftool/dbconfig/20220614-100250-marostegui.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29723 and previous config saved to /var/cache/conftool/dbconfig/20220614-094745-marostegui.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29722 and previous config saved to /var/cache/conftool/dbconfig/20220614-093240-marostegui.json
* 09:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:23 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29721 and previous config saved to /var/cache/conftool/dbconfig/20220614-092330-marostegui.json
* 09:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29720 and previous config saved to /var/cache/conftool/dbconfig/20220614-092322-marostegui.json
* 09:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 09:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 09:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 09:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
* 09:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 09:08 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29719 and previous config saved to /var/cache/conftool/dbconfig/20220614-090817-marostegui.json
* 09:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
* 09:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 09:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 09:01 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
* 09:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
* 08:58 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
* 08:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1001.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
* 08:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS buster
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29718 and previous config saved to /var/cache/conftool/dbconfig/20220614-085312-marostegui.json
* 08:53 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63] (duration: 07m 27s)
* 08:51 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:49 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
* 08:48 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 08:48 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:47 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
* 08:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
* 08:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 08:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63]
* 08:45 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63] (duration: 00m 08s)
* 08:44 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63]
* 08:44 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63] (duration: 04m 45s)
* 08:39 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63]
* 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
* 08:38 godog: reboot centrallog2002 - [[phab:T310483|T310483]]
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29717 and previous config saved to /var/cache/conftool/dbconfig/20220614-083807-marostegui.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29716 and previous config saved to /var/cache/conftool/dbconfig/20220614-082855-marostegui.json
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29715 and previous config saved to /var/cache/conftool/dbconfig/20220614-082847-marostegui.json
* 08:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:20 marostegui: dbmaint s6@eqiad [[phab:T298560|T298560]]
* 08:18 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:16 marostegui: dbmaint s6@eqiad [[phab:T309311|T309311]]
* 08:16 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63] (duration: 31m 09s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29714 and previous config saved to /var/cache/conftool/dbconfig/20220614-081342-marostegui.json
* 08:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS buster
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29713 and previous config saved to /var/cache/conftool/dbconfig/20220614-075837-marostegui.json
* 07:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29712 and previous config saved to /var/cache/conftool/dbconfig/20220614-074331-marostegui.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29711 and previous config saved to /var/cache/conftool/dbconfig/20220614-073322-marostegui.json
* 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 taavi: UTC morning deploys done
* 07:24 marostegui: dbmaint s6@eqiad [[phab:T298563|T298563]]
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804806{{!}}Enable Realtime Preview on cawiki, viwiki, and fawiki (T303961)]] (duration: 03m 20s)
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:802685{{!}}Update $wgVectorMaxWidthOptions to include action=edit (T307725)]] (duration: 03m 36s)
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 marostegui: dbmaint s6@eqiad [[phab:T300381|T300381]]
* 07:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for schema change', diff saved to https://phabricator.wikimedia.org/P29710 and previous config saved to /var/cache/conftool/dbconfig/20220614-065322-root.json
* 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:28 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] (duration: 03m 31s)
* 06:27 marostegui: Reboot dbproxy1012 and dbproxy1015 [[phab:T310484|T310484]]
* 06:24 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/includes/ServiceWiring.php: [[phab:T212129|T212129]] (duration: 03m 33s)
* 06:20 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/extension.json: [[phab:T212129|T212129]] (duration: 03m 32s)
* 06:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29709 and previous config saved to /var/cache/conftool/dbconfig/20220614-060608-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29707 and previous config saved to /var/cache/conftool/dbconfig/20220614-060155-root.json
* 06:01 marostegui: Starting s6 eqiad failover from db1173 to db1131 - [[phab:T300471|T300471]]
* 05:11 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] Switch wgMainStash to db-mainstash (duration: 03m 38s)
* 05:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29706 and previous config saved to /var/cache/conftool/dbconfig/20220614-045224-root.json
* 04:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 04:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29705 and previous config saved to /var/cache/conftool/dbconfig/20220614-024047-ladsgroup.json
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29704 and previous config saved to /var/cache/conftool/dbconfig/20220614-022542-ladsgroup.json
* 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29703 and previous config saved to /var/cache/conftool/dbconfig/20220614-021037-ladsgroup.json
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29702 and previous config saved to /var/cache/conftool/dbconfig/20220614-015532-ladsgroup.json
* 00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29701 and previous config saved to /var/cache/conftool/dbconfig/20220614-003608-marostegui.json
* 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29700 and previous config saved to /var/cache/conftool/dbconfig/20220614-002103-marostegui.json
* 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29699 and previous config saved to /var/cache/conftool/dbconfig/20220614-000558-marostegui.json


== June 11 ==
== 2022-06-13 ==
* 23:59 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217753 (duration: 00m 16s)
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29698 and previous config saved to /var/cache/conftool/dbconfig/20220613-235053-marostegui.json
* 23:54 logmsgbot: ori Synchronized php-1.26wmf9/includes/EditPage.php: cf7df757f2: Instrument edit failures (duration: 00m 14s)
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:41 logmsgbot: ebernhardson Synchronized php-1.26wmf9/extensions/MobileFrontend: Bump MobileFrontend in 1.26wmf9 for SWAT (duration: 00m 14s)
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:40 ejegg: updated civicrm from 7ffe0cefb019828a09c9369187f14518847b5f41 to d13aaa4e9e937b0b1ae1f5de61ea7ff1f316d58f
* 23:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:24 logmsgbot: ebernhardson Synchronized php-1.26wmf9/extensions/CirrusSearch/: Fix prefer-recent queries in cirrussearch (duration: 00m 13s)
* 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:02 ejegg: updated SmashPig on the rest of the cluster from 477e8a8be5ea895262031c147330de5a651cc3ac to 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510
* 23:45 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 801836 remove variable wmgDbconfigFromEtcd (duration: 03m 26s)
* 22:17 godog: temporary bump php memory_limit on magnesium to test T102092
* 23:35 tstarling@deploy1002: Synchronized wmf-config/etcd.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 36s)
* 22:11 ejegg: updated SmashPig on payments-listener from 477e8a8be5ea895262031c147330de5a651cc3ac to 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510
* 23:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:54 ori: Widespread TC cache exhaustion again, doing rolling restart of HHVMs
* 23:30 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 30s)
* 21:46 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I3d3ed7647: Test LCStoreStaticArray on test2wiki (duration: 00m 14s)
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:01 godog: NPE while trying to make restbase1007 (cassandra 2.1.5) join the cluster, trying matching the same cassandra version (2.1.3)
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:57 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: fix last commit, did not have any affect (duration: 00m 16s)
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:55 ejegg: updated payments from 43c7952d2a31deaea97e8319f5612d644dce43c8 to f33d0a8687a120a2057a7e6acad67da63b17f97e
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29697 and previous config saved to /var/cache/conftool/dbconfig/20220613-232537-marostegui.json
* 20:54 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217688/1 (duration: 00m 13s)
* 23:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:10 godog: sign restbase1007 puppet key and first puppet run
* 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 19:10 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217591 (duration: 00m 13s)
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29696 and previous config saved to /var/cache/conftool/dbconfig/20220613-232529-marostegui.json
* 18:58 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: beta only change - https://gerrit.wikimedia.org/r/217560 (duration: 00m 12s)
* 23:16 mutante: gitlab-runner2001 - systemctl reset-failed to clear alert about failed ifup for ens14 which is actually up. race condiation caused by reboot
* 18:55 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents: T101806 (duration: 00m 14s)
* 23:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29695 and previous config saved to /var/cache/conftool/dbconfig/20220613-231024-marostegui.json
* 18:43 logmsgbot: twentyafterfour Synchronized php-1.26wmf9/includes/AjaxResponse.php: Hotfix Iafff9982bbbee893c13f891901dde88f998db7a6 (duration: 00m 14s)
* 22:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29694 and previous config saved to /var/cache/conftool/dbconfig/20220613-225519-marostegui.json
* 18:16 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf9
* 22:55 AndyRussG: payments-wiki upgraded from {{Gerrit|8c6208c2}} to {{Gerrit|10304f69}}
* 17:44 ejegg: rolled back payments to 43c7952d2a31deaea97e8319f5612d644dce43c8
* 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29693 and previous config saved to /var/cache/conftool/dbconfig/20220613-224014-marostegui.json
* 17:41 ejegg: updated payments from 43c7952d2a31deaea97e8319f5612d644dce43c8 to 15f24d24b150d5d774314b0c1b40ae26a73185f2
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29692 and previous config saved to /var/cache/conftool/dbconfig/20220613-221522-marostegui.json
* 17:00 moritzm: updated mc200[1-3] to linux 3.19
* 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 16:28 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Use arbitrary access tag (duration: 00m 12s)
* 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 16:27 logmsgbot: aude Synchronized wmf-config/CommonSettings.php: Add arbitrary access group tag (duration: 00m 13s)
* 22:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 16:27 logmsgbot: aude Synchronized arbitraryaccess.dblist: Add dblist for arbitrary access wikis (duration: 00m 13s)
* 22:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 16:24 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Use usagetracking tag (duration: 00m 13s)
* 21:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 16:23 logmsgbot: aude Synchronized wmf-config/CommonSettings.php: Add usagetracking group tag (duration: 00m 16s)
* 21:56 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 16:23 ori: Scap + deployments exhausted TC cache on Apaches; performed a rolling restart of HHVM
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 16:21 logmsgbot: aude Synchronized usagetracking.dblist: Add dblist for usage tracking wikis (duration: 00m 25s)
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 16:19 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Disable Parsoid update jobs (duration: 00m 14s)
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:18 logmsgbot: thcipriani Finished scap: SWAT: Update namespaces and special pages for Northern Luri (lrc) from translatewiki [[gerrit:216533]] [[gerrit:217327]] (duration: 32m 11s)
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 15:46 logmsgbot: thcipriani Started scap: SWAT: Update namespaces and special pages for Northern Luri (lrc) from translatewiki [[gerrit:216533]] [[gerrit:217327]]
* 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29691 and previous config saved to /var/cache/conftool/dbconfig/20220613-215118-marostegui.json
* 15:27 logmsgbot: thcipriani Synchronized php-1.26wmf9/extensions/OpenStackManager: SWAT: update OpenStackManager to disable unused sudoer features [[gerrit:217407]] (duration: 00m 13s)
* 21:48 mutante: gitlab-runner* - sequentially pausing, rebooting, resuming one by one
* 15:11 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Make VisualEditor access RESTbase directly on all public wikis [[gerrit:214833]] (duration: 00m 12s)
* 21:44 mutante: gitlab-runner1001 - pause from accepting jobs - rebooting
* 15:05 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150611 [[gerrit:217460 ]] (duration: 00m 12s)
* 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29690 and previous config saved to /var/cache/conftool/dbconfig/20220613-213613-marostegui.json
* 14:33 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable usage tracking on jawiki (duration: 00m 12s)
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29689 and previous config saved to /var/cache/conftool/dbconfig/20220613-212108-marostegui.json
* 13:40 _joe_: rolling restart of all the restbase instances
* 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29688 and previous config saved to /var/cache/conftool/dbconfig/20220613-210603-marostegui.json
* 13:33 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable usage tracking on frwiki (duration: 00m 12s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 _joe_: running puppet on all restbase hosts
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 _joe_: running puppet on restbase1001
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 _joe_: disabling puppet on restbase hosts in anticipation for merging https://gerrit.wikimedia.org/r/217431
* 20:29 cjming: end of UTC late backport window
* 13:11 paravoid: removing gdnsd from apt: precise-wikimedia (1.9.0-1~precise1/2.1.0-1~precise1), trusty-wikimedia (2.1.0-1), jessie-wikimedia (2.1.2-1~deb8u1)
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:13 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable arbitrary access on Wikivoyage and Wikiquote (duration: 00m 13s)
* 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:805206{{!}}Disable TOC A/B test for beta cluster (T309683)]] (duration: 03m 29s)
* 11:48 YuviPanda: reboot labvirt1005 for kernel upgrade
* 20:22 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 30s)
* 11:46 YuviPanda: installing linux-image-generic-lts-vivid on labvirt1005 to get a 3.19 kernel
* 20:19 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
* 09:51 akosiaris: uploaded ruby-jsduck_5.3.4 and ruby-rkelly-remix_0.0.6 on apt.wikimedia.org/jessie-wikimedia/main
* 20:18 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ug.svg: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 36s)
* 08:18 akosiaris: recreating jessie chroots on copper
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 11 06:21:53 UTC 2015 (duration 21m 52s)
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:44 twentyafterfour: upgraded phabricator at 1:50 UTC (belatedly logged...)
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-11 03:01:48+00:00
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:00 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 01m 16s)
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29687 and previous config saved to /var/cache/conftool/dbconfig/20220613-201420-marostegui.json
* 02:59 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 59s)
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 02:43 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-11 02:43:34+00:00
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 09m 13s)
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29686 and previous config saved to /var/cache/conftool/dbconfig/20220613-201407-marostegui.json
* 20:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 27s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:08 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-crh.svg: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 16s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29685 and previous config saved to /var/cache/conftool/dbconfig/20220613-195902-marostegui.json
* 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29684 and previous config saved to /var/cache/conftool/dbconfig/20220613-194356-marostegui.json
* 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29683 and previous config saved to /var/cache/conftool/dbconfig/20220613-192851-marostegui.json
* 19:12 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 19:12 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 19:11 mutante: etherpad - minimal downtime - rebooting etherpad1003
* 19:07 mutante: gerrit2002 - rebooting
* 19:04 mutante: gitlab2003 - rebooting
* 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29682 and previous config saved to /var/cache/conftool/dbconfig/20220613-190314-marostegui.json
* 19:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 19:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 19:01 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
* 18:55 mutante: gitlab2002 - rebooting
* 18:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29681 and previous config saved to /var/cache/conftool/dbconfig/20220613-184015-marostegui.json
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29680 and previous config saved to /var/cache/conftool/dbconfig/20220613-182510-marostegui.json
* 18:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29679 and previous config saved to /var/cache/conftool/dbconfig/20220613-181005-marostegui.json
* 17:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1146.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29678 and previous config saved to /var/cache/conftool/dbconfig/20220613-175500-marostegui.json
* 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1145.eqiad.wmnet with OS buster
* 17:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1143.eqiad.wmnet with OS buster
* 17:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1148.eqiad.wmnet with OS buster
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 17:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2002.codfw.wmnet with OS buster
* 17:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
* 17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1147.eqiad.wmnet with OS buster
* 17:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29677 and previous config saved to /var/cache/conftool/dbconfig/20220613-171438-marostegui.json
* 17:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29676 and previous config saved to /var/cache/conftool/dbconfig/20220613-171430-marostegui.json
* 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3001.esams.wmnet with OS bullseye
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1144.eqiad.wmnet with OS buster
* 17:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS buster
* 17:03 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
* 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29675 and previous config saved to /var/cache/conftool/dbconfig/20220613-165925-marostegui.json
* 16:58 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
* 16:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS buster
* 16:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1146.eqiad.wmnet with OS buster
* 16:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 16:53 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1145.eqiad.wmnet with OS buster
* 16:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29674 and previous config saved to /var/cache/conftool/dbconfig/20220613-164419-marostegui.json
* 16:40 dancy@deploy1002: prep aborted:  (duration: 01m 40s)
* 16:38 dancy@deploy1002: prep aborted:  (duration: 06m 12s)
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
* 16:32 marostegui: dbmaint x2@eqiad upgrade and reboot all x2 db hosts [[phab:T310485|T310485]]
* 16:32 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti3001.esams.wmnet with OS bullseye
* 16:32 dancy@deploy1002: prep aborted:  (duration: 00m 26s)
* 16:31 marostegui: Reboot all codfw parsercache hosts [[phab:T310485|T310485]]
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29673 and previous config saved to /var/cache/conftool/dbconfig/20220613-162914-marostegui.json
* 16:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS buster
* 16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2001.codfw.wmnet with OS buster
* 16:10 robh: ganeti3001 rebooting and reimaging for firmware updates via [[phab:T308238|T308238]]
* 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 27s)
* 15:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:47 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 35s)
* 15:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS buster
* 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29672 and previous config saved to /var/cache/conftool/dbconfig/20220613-152900-marostegui.json
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29671 and previous config saved to /var/cache/conftool/dbconfig/20220613-152852-marostegui.json
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29670 and previous config saved to /var/cache/conftool/dbconfig/20220613-151347-marostegui.json
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 15:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
* 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29669 and previous config saved to /var/cache/conftool/dbconfig/20220613-145842-marostegui.json
* 14:58 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29668 and previous config saved to /var/cache/conftool/dbconfig/20220613-144337-marostegui.json
* 14:42 marostegui: Failover m1 and m2 to a different proxy [[phab:T310484|T310484]]
* 14:38 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:34 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29667 and previous config saved to /var/cache/conftool/dbconfig/20220613-141802-marostegui.json
* 14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29666 and previous config saved to /var/cache/conftool/dbconfig/20220613-141754-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29665 and previous config saved to /var/cache/conftool/dbconfig/20220613-140249-marostegui.json
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 13:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dumpsdata1007.eqiad.wmnet
* 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1057.eqiad.wmnet with OS bullseye
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29663 and previous config saved to /var/cache/conftool/dbconfig/20220613-134744-marostegui.json
* 13:45 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
* 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29662 and previous config saved to /var/cache/conftool/dbconfig/20220613-133239-marostegui.json
* 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dse-k8s-worker[1001-1004].eqiad.wmnet with reason: reboots
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on dse-k8s-worker[1001-1004].eqiad.wmnet with reason: reboots
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
* 13:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
* 13:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
* 13:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 13:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1057.eqiad.wmnet with reason: host reimage
* 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
* 13:24 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
* 13:22 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/GrowthExperiments/modules/ext.growthExperiments.DataStore/NewcomerTasksStore.js: {{Gerrit|67a5352b0bf9f6aa160cc93a42ca22a02aad883a}}: NewcomerTasksStore: update quality gate config when the task queue is set ([[phab:T309768|T309768]]) (duration: 03m 41s)
* 13:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1057.eqiad.wmnet with reason: host reimage
* 13:21 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
* 13:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
* 13:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
* 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: reboots
* 13:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
* 13:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: reboots
* 13:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29660 and previous config saved to /var/cache/conftool/dbconfig/20220613-130512-marostegui.json
* 13:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 13:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29659 and previous config saved to /var/cache/conftool/dbconfig/20220613-130504-marostegui.json
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
* 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29658 and previous config saved to /var/cache/conftool/dbconfig/20220613-125419-root.json
* 12:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1057.eqiad.wmnet with OS bullseye
* 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
* 12:51 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P29657 and previous config saved to /var/cache/conftool/dbconfig/20220613-124959-marostegui.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29655 and previous config saved to /var/cache/conftool/dbconfig/20220613-123915-root.json
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P29654 and previous config saved to /var/cache/conftool/dbconfig/20220613-123454-marostegui.json
* 12:33 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
* 12:29 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29653 and previous config saved to /var/cache/conftool/dbconfig/20220613-122411-root.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29652 and previous config saved to /var/cache/conftool/dbconfig/20220613-121949-marostegui.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29651 and previous config saved to /var/cache/conftool/dbconfig/20220613-120907-root.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti3001.esams.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 12:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti3001.esams.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29650 and previous config saved to /var/cache/conftool/dbconfig/20220613-115404-root.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29649 and previous config saved to /var/cache/conftool/dbconfig/20220613-115238-marostegui.json
* 11:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29648 and previous config saved to /var/cache/conftool/dbconfig/20220613-113900-root.json
* 11:36 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox-dev2002.codfw.wmnet
* 11:35 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp2002.wikimedia.org
* 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29647 and previous config saved to /var/cache/conftool/dbconfig/20220613-113004-marostegui.json
* 11:28 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
* 11:27 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
* 11:27 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
* 11:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
* 11:25 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
* 11:25 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
* 11:24 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=netbox,name=codfw
* 11:24 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=netbox
* 11:24 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host netbox1002.eqiad.wmnet
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29646 and previous config saved to /var/cache/conftool/dbconfig/20220613-112356-root.json
* 11:19 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
* 11:19 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
* 11:19 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=netbox,name=eqiad
* 11:19 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=netbox,name=eqiad
* 11:18 marostegui: Reboot x2 hosts for kernel upgrade [[phab:T310485|T310485]]
* 11:18 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=netbox,name=codfw
* 11:18 marostegui: Reboot db1131 for kernel upgrade [[phab:T310485|T310485]]
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29645 and previous config saved to /var/cache/conftool/dbconfig/20220613-111621-root.json
* 11:15 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
* 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P29644 and previous config saved to /var/cache/conftool/dbconfig/20220613-111459-marostegui.json
* 11:14 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
* 11:12 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
* 11:12 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2002.codfw.wmnet
* 11:11 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 11:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 11:08 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* 11:07 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki2002.codfw.wmnet
* 11:04 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 11:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dsharpe out of all services on: 1219 hosts
* 11:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dsharpe out of all services on: 1219 hosts
* 11:00 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P29643 and previous config saved to /var/cache/conftool/dbconfig/20220613-105954-marostegui.json
* 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dsharpe out of all services on: 609 hosts
* 10:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dsharpe out of all services on: 609 hosts
* 10:52 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:52 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:52 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:51 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:51 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:50 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:50 klausman@deploy1002: helmfile [ml-s