You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(awight@tin Synchronized php-1.27.0-wmf.9/extensions/CentralNotice: Update CentralNotice: T122251 (duration: 00m 34s) (logmsgbot))
imported>Stashbot
(dancy@deploy1002: backport aborted: (duration: 00m 12s))
Line 1: Line 1:
== 2015-12-29 ==
== 2022-06-24 ==
* 01:02 logmsgbot: awight@tin Synchronized php-1.27.0-wmf.9/extensions/CentralNotice: Update CentralNotice: T122251 (duration: 00m 34s)
* 19:35 dancy@deploy1002: backport aborted:  (duration: 00m 12s)
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:31 sukhe: finished running homer * commit "adding sukhe" CR: {{Gerrit|8071451}}
* 15:18 dancy@deploy1002: Finished deploy [integration/docroot@ea9b8fa]: (no justification provided) (duration: 00m 08s)
* 15:17 dancy@deploy1002: Started deploy [integration/docroot@ea9b8fa]: (no justification provided)
* 15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 04s)
* 14:53 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 02m 37s)
* 14:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:39 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:39 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30242 and previous config saved to /var/cache/conftool/dbconfig/20220624-143544-root.json
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30241 and previous config saved to /var/cache/conftool/dbconfig/20220624-143537-root.json
* 14:31 sukhe: running homer * commit "adding sukhe" CR: 807145
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30240 and previous config saved to /var/cache/conftool/dbconfig/20220624-142303-root.json
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30239 and previous config saved to /var/cache/conftool/dbconfig/20220624-142040-root.json
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30238 and previous config saved to /var/cache/conftool/dbconfig/20220624-142033-root.json
* 14:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:11 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:10 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30237 and previous config saved to /var/cache/conftool/dbconfig/20220624-140759-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30236 and previous config saved to /var/cache/conftool/dbconfig/20220624-140536-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30235 and previous config saved to /var/cache/conftool/dbconfig/20220624-140529-root.json
* 14:03 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:03 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30234 and previous config saved to /var/cache/conftool/dbconfig/20220624-135940-root.json
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30233 and previous config saved to /var/cache/conftool/dbconfig/20220624-135255-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30232 and previous config saved to /var/cache/conftool/dbconfig/20220624-135032-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30231 and previous config saved to /var/cache/conftool/dbconfig/20220624-135025-root.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30230 and previous config saved to /var/cache/conftool/dbconfig/20220624-134436-root.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30229 and previous config saved to /var/cache/conftool/dbconfig/20220624-134423-root.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30228 and previous config saved to /var/cache/conftool/dbconfig/20220624-133751-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30227 and previous config saved to /var/cache/conftool/dbconfig/20220624-133528-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30226 and previous config saved to /var/cache/conftool/dbconfig/20220624-133521-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30225 and previous config saved to /var/cache/conftool/dbconfig/20220624-132932-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30224 and previous config saved to /var/cache/conftool/dbconfig/20220624-132919-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30223 and previous config saved to /var/cache/conftool/dbconfig/20220624-132247-root.json
* 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS buster
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30222 and previous config saved to /var/cache/conftool/dbconfig/20220624-132024-root.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30221 and previous config saved to /var/cache/conftool/dbconfig/20220624-132017-root.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30220 and previous config saved to /var/cache/conftool/dbconfig/20220624-131428-root.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30219 and previous config saved to /var/cache/conftool/dbconfig/20220624-131415-root.json
* 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:11 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30218 and previous config saved to /var/cache/conftool/dbconfig/20220624-130937-root.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30217 and previous config saved to /var/cache/conftool/dbconfig/20220624-130743-root.json
* 13:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
* 13:05 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:05 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30216 and previous config saved to /var/cache/conftool/dbconfig/20220624-130519-root.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30215 and previous config saved to /var/cache/conftool/dbconfig/20220624-130514-root.json
* 13:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30214 and previous config saved to /var/cache/conftool/dbconfig/20220624-130055-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30213 and previous config saved to /var/cache/conftool/dbconfig/20220624-125924-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30212 and previous config saved to /var/cache/conftool/dbconfig/20220624-125911-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30211 and previous config saved to /var/cache/conftool/dbconfig/20220624-125834-root.json
* 12:58 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 12:58 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30210 and previous config saved to /var/cache/conftool/dbconfig/20220624-125433-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30209 and previous config saved to /var/cache/conftool/dbconfig/20220624-125401-root.json
* 12:54 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 12:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:53 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:46 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:46 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30208 and previous config saved to /var/cache/conftool/dbconfig/20220624-124420-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30207 and previous config saved to /var/cache/conftool/dbconfig/20220624-124407-root.json
* 12:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 03s)
* 12:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30206 and previous config saved to /var/cache/conftool/dbconfig/20220624-123929-root.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30205 and previous config saved to /var/cache/conftool/dbconfig/20220624-123857-root.json
* 12:34 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 03s)
* 12:34 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30204 and previous config saved to /var/cache/conftool/dbconfig/20220624-122916-root.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30203 and previous config saved to /var/cache/conftool/dbconfig/20220624-122903-root.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30202 and previous config saved to /var/cache/conftool/dbconfig/20220624-122728-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30201 and previous config saved to /var/cache/conftool/dbconfig/20220624-122425-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30200 and previous config saved to /var/cache/conftool/dbconfig/20220624-122353-root.json
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30199 and previous config saved to /var/cache/conftool/dbconfig/20220624-122256-root.json
* 12:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 28s)
* 12:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30198 and previous config saved to /var/cache/conftool/dbconfig/20220624-121359-root.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30197 and previous config saved to /var/cache/conftool/dbconfig/20220624-121224-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30196 and previous config saved to /var/cache/conftool/dbconfig/20220624-120922-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30195 and previous config saved to /var/cache/conftool/dbconfig/20220624-120849-root.json
* 12:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@18182aa]: (no justification provided) (duration: 03m 47s)
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30194 and previous config saved to /var/cache/conftool/dbconfig/20220624-120632-root.json
* 12:04 bmansurov@deploy1002: Started deploy [airflow-dags/research@18182aa]: (no justification provided)
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30193 and previous config saved to /var/cache/conftool/dbconfig/20220624-120411-root.json
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30192 and previous config saved to /var/cache/conftool/dbconfig/20220624-115720-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30191 and previous config saved to /var/cache/conftool/dbconfig/20220624-115418-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30190 and previous config saved to /var/cache/conftool/dbconfig/20220624-115345-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30189 and previous config saved to /var/cache/conftool/dbconfig/20220624-114907-root.json
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30188 and previous config saved to /var/cache/conftool/dbconfig/20220624-114816-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30187 and previous config saved to /var/cache/conftool/dbconfig/20220624-114217-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30186 and previous config saved to /var/cache/conftool/dbconfig/20220624-113914-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30185 and previous config saved to /var/cache/conftool/dbconfig/20220624-113841-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30184 and previous config saved to /var/cache/conftool/dbconfig/20220624-113403-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30183 and previous config saved to /var/cache/conftool/dbconfig/20220624-113312-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30182 and previous config saved to /var/cache/conftool/dbconfig/20220624-113020-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30181 and previous config saved to /var/cache/conftool/dbconfig/20220624-112713-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30180 and previous config saved to /var/cache/conftool/dbconfig/20220624-111859-root.json
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30179 and previous config saved to /var/cache/conftool/dbconfig/20220624-111808-root.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30178 and previous config saved to /var/cache/conftool/dbconfig/20220624-111209-root.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30177 and previous config saved to /var/cache/conftool/dbconfig/20220624-110356-root.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30176 and previous config saved to /var/cache/conftool/dbconfig/20220624-110305-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30175 and previous config saved to /var/cache/conftool/dbconfig/20220624-105705-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30174 and previous config saved to /var/cache/conftool/dbconfig/20220624-104852-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30173 and previous config saved to /var/cache/conftool/dbconfig/20220624-104849-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30172 and previous config saved to /var/cache/conftool/dbconfig/20220624-104801-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30171 and previous config saved to /var/cache/conftool/dbconfig/20220624-104407-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30170 and previous config saved to /var/cache/conftool/dbconfig/20220624-104403-root.json
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30169 and previous config saved to /var/cache/conftool/dbconfig/20220624-103342-root.json
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30168 and previous config saved to /var/cache/conftool/dbconfig/20220624-103257-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30166 and previous config saved to /var/cache/conftool/dbconfig/20220624-102904-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30165 and previous config saved to /var/cache/conftool/dbconfig/20220624-102859-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30164 and previous config saved to /var/cache/conftool/dbconfig/20220624-102856-root.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30163 and previous config saved to /var/cache/conftool/dbconfig/20220624-101753-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30162 and previous config saved to /var/cache/conftool/dbconfig/20220624-101400-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30161 and previous config saved to /var/cache/conftool/dbconfig/20220624-101349-root.json
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30160 and previous config saved to /var/cache/conftool/dbconfig/20220624-100752-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30159 and previous config saved to /var/cache/conftool/dbconfig/20220624-095946-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30158 and previous config saved to /var/cache/conftool/dbconfig/20220624-095935-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30157 and previous config saved to /var/cache/conftool/dbconfig/20220624-095856-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30156 and previous config saved to /var/cache/conftool/dbconfig/20220624-095845-root.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30155 and previous config saved to /var/cache/conftool/dbconfig/20220624-094442-root.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30154 and previous config saved to /var/cache/conftool/dbconfig/20220624-094431-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30153 and previous config saved to /var/cache/conftool/dbconfig/20220624-094352-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30152 and previous config saved to /var/cache/conftool/dbconfig/20220624-094342-root.json
* 09:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:35 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 09:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30151 and previous config saved to /var/cache/conftool/dbconfig/20220624-092938-root.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30150 and previous config saved to /var/cache/conftool/dbconfig/20220624-092927-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30149 and previous config saved to /var/cache/conftool/dbconfig/20220624-092848-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30148 and previous config saved to /var/cache/conftool/dbconfig/20220624-092838-root.json
* 09:25 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:24 moritzm: installing publicsuffix updates from last buster point release
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30147 and previous config saved to /var/cache/conftool/dbconfig/20220624-091434-root.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30146 and previous config saved to /var/cache/conftool/dbconfig/20220624-091423-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30145 and previous config saved to /var/cache/conftool/dbconfig/20220624-091344-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30144 and previous config saved to /var/cache/conftool/dbconfig/20220624-091334-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30143 and previous config saved to /var/cache/conftool/dbconfig/20220624-091227-root.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137,db1138 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30142 and previous config saved to /var/cache/conftool/dbconfig/20220624-090810-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30141 and previous config saved to /var/cache/conftool/dbconfig/20220624-085930-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30140 and previous config saved to /var/cache/conftool/dbconfig/20220624-085919-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30139 and previous config saved to /var/cache/conftool/dbconfig/20220624-085904-root.json
* 08:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts webperf2002.codfw.wmnet
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30137 and previous config saved to /var/cache/conftool/dbconfig/20220624-085723-root.json
* 08:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts webperf2002.codfw.wmnet
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30136 and previous config saved to /var/cache/conftool/dbconfig/20220624-085217-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30135 and previous config saved to /var/cache/conftool/dbconfig/20220624-085210-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30134 and previous config saved to /var/cache/conftool/dbconfig/20220624-085129-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30133 and previous config saved to /var/cache/conftool/dbconfig/20220624-085003-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30132 and previous config saved to /var/cache/conftool/dbconfig/20220624-084426-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30131 and previous config saved to /var/cache/conftool/dbconfig/20220624-084415-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30130 and previous config saved to /var/cache/conftool/dbconfig/20220624-084401-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30129 and previous config saved to /var/cache/conftool/dbconfig/20220624-084219-root.json
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30128 and previous config saved to /var/cache/conftool/dbconfig/20220624-083806-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30127 and previous config saved to /var/cache/conftool/dbconfig/20220624-083713-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30126 and previous config saved to /var/cache/conftool/dbconfig/20220624-083706-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30125 and previous config saved to /var/cache/conftool/dbconfig/20220624-083625-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30124 and previous config saved to /var/cache/conftool/dbconfig/20220624-083459-root.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30123 and previous config saved to /var/cache/conftool/dbconfig/20220624-082857-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30115 and previous config saved to /var/cache/conftool/dbconfig/20220624-080705-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30114 and previous config saved to /var/cache/conftool/dbconfig/20220624-080658-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30113 and previous config saved to /var/cache/conftool/dbconfig/20220624-080618-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30112 and previous config saved to /var/cache/conftool/dbconfig/20220624-080451-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30111 and previous config saved to /var/cache/conftool/dbconfig/20220624-075849-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30110 and previous config saved to /var/cache/conftool/dbconfig/20220624-075707-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30109 and previous config saved to /var/cache/conftool/dbconfig/20220624-075201-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30108 and previous config saved to /var/cache/conftool/dbconfig/20220624-075154-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30107 and previous config saved to /var/cache/conftool/dbconfig/20220624-075114-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30106 and previous config saved to /var/cache/conftool/dbconfig/20220624-075102-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30105 and previous config saved to /var/cache/conftool/dbconfig/20220624-074947-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30104 and previous config saved to /var/cache/conftool/dbconfig/20220624-074345-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30103 and previous config saved to /var/cache/conftool/dbconfig/20220624-074204-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30102 and previous config saved to /var/cache/conftool/dbconfig/20220624-073657-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30101 and previous config saved to /var/cache/conftool/dbconfig/20220624-073651-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30100 and previous config saved to /var/cache/conftool/dbconfig/20220624-073610-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30099 and previous config saved to /var/cache/conftool/dbconfig/20220624-073558-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30098 and previous config saved to /var/cache/conftool/dbconfig/20220624-073543-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30097 and previous config saved to /var/cache/conftool/dbconfig/20220624-073444-root.json
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-cache[2001-2003].codfw.wmnet with reason: reboots
* 07:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-cache[2001-2003].codfw.wmnet with reason: reboots
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30096 and previous config saved to /var/cache/conftool/dbconfig/20220624-072841-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30095 and previous config saved to /var/cache/conftool/dbconfig/20220624-072240-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30094 and previous config saved to /var/cache/conftool/dbconfig/20220624-072153-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30093 and previous config saved to /var/cache/conftool/dbconfig/20220624-072147-root.json
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30092 and previous config saved to /var/cache/conftool/dbconfig/20220624-072106-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30091 and previous config saved to /var/cache/conftool/dbconfig/20220624-072054-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30090 and previous config saved to /var/cache/conftool/dbconfig/20220624-071940-root.json
* 07:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30089 and previous config saved to /var/cache/conftool/dbconfig/20220624-071551-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30088 and previous config saved to /var/cache/conftool/dbconfig/20220624-071439-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es1025 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30087 and previous config saved to /var/cache/conftool/dbconfig/20220624-070700-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30086 and previous config saved to /var/cache/conftool/dbconfig/20220624-070601-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30085 and previous config saved to /var/cache/conftool/dbconfig/20220624-070555-root.json
* 07:02 marostegui: Reboot db1117 for kernel upgrade (expect haproxy irc alerts)
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30084 and previous config saved to /var/cache/conftool/dbconfig/20220624-070201-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30083 and previous config saved to /var/cache/conftool/dbconfig/20220624-070157-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30082 and previous config saved to /var/cache/conftool/dbconfig/20220624-070151-root.json
* 06:53 jynus: restarting bacula director @ backup1001
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30081 and previous config saved to /var/cache/conftool/dbconfig/20220624-065057-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30080 and previous config saved to /var/cache/conftool/dbconfig/20220624-065051-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30079 and previous config saved to /var/cache/conftool/dbconfig/20220624-064657-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30078 and previous config saved to /var/cache/conftool/dbconfig/20220624-064653-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30077 and previous config saved to /var/cache/conftool/dbconfig/20220624-064647-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30076 and previous config saved to /var/cache/conftool/dbconfig/20220624-063553-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30075 and previous config saved to /var/cache/conftool/dbconfig/20220624-063547-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30074 and previous config saved to /var/cache/conftool/dbconfig/20220624-063154-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30073 and previous config saved to /var/cache/conftool/dbconfig/20220624-063149-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30072 and previous config saved to /var/cache/conftool/dbconfig/20220624-063143-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30071 and previous config saved to /var/cache/conftool/dbconfig/20220624-062049-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30070 and previous config saved to /var/cache/conftool/dbconfig/20220624-062043-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30069 and previous config saved to /var/cache/conftool/dbconfig/20220624-061650-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30068 and previous config saved to /var/cache/conftool/dbconfig/20220624-061645-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30067 and previous config saved to /var/cache/conftool/dbconfig/20220624-061640-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30066 and previous config saved to /var/cache/conftool/dbconfig/20220624-060545-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30065 and previous config saved to /var/cache/conftool/dbconfig/20220624-060539-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30064 and previous config saved to /var/cache/conftool/dbconfig/20220624-060146-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30063 and previous config saved to /var/cache/conftool/dbconfig/20220624-060141-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30062 and previous config saved to /var/cache/conftool/dbconfig/20220624-060136-root.json
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30061 and previous config saved to /var/cache/conftool/dbconfig/20220624-055643-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30060 and previous config saved to /var/cache/conftool/dbconfig/20220624-055042-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30059 and previous config saved to /var/cache/conftool/dbconfig/20220624-055035-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30058 and previous config saved to /var/cache/conftool/dbconfig/20220624-054642-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30057 and previous config saved to /var/cache/conftool/dbconfig/20220624-054637-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30056 and previous config saved to /var/cache/conftool/dbconfig/20220624-054632-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170 after kernel reboots', diff saved to https://phabricator.wikimedia.org/P30055 and previous config saved to /var/cache/conftool/dbconfig/20220624-054259-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30054 and previous config saved to /var/cache/conftool/dbconfig/20220624-054139-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30053 and previous config saved to /var/cache/conftool/dbconfig/20220624-053652-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30052 and previous config saved to /var/cache/conftool/dbconfig/20220624-053538-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30051 and previous config saved to /var/cache/conftool/dbconfig/20220624-053531-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30050 and previous config saved to /var/cache/conftool/dbconfig/20220624-053138-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30049 and previous config saved to /var/cache/conftool/dbconfig/20220624-053134-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30048 and previous config saved to /var/cache/conftool/dbconfig/20220624-053128-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 db1169 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30047 and previous config saved to /var/cache/conftool/dbconfig/20220624-052758-root.json
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 db1174 db1175 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30046 and previous config saved to /var/cache/conftool/dbconfig/20220624-052137-root.json


== 2015-12-28 ==
== 2022-06-23 ==
* 21:45 gwicke: restbase: rolling restart to apply https://gerrit.wikimedia.org/r/261206
* 21:23 mutante: restbase-dev1006 has manually installed packages (wrk, maybe others)
* 21:26 mutante: tin & mira: started salt minions that were in status stop/waiting
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:25 logmsgbot: aaron@tin Synchronized private/PrivateSettings.php: (no message) (duration: 00m 30s)
* 21:22 brennen: end of utc late backport & config window
* 21:20 logmsgbot: aaron@tin Synchronized wmf-config/PrivateSettings.php: $wmfSwiftConfig convenience variable (duration: 00m 30s)
* 21:21 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808055{{!}}[cleanup] Drop non-existent feature flags]] (duration: 03m 33s)
* 20:59 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1050 & db1022 after emergency fix (duration: 00m 31s)
* 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:52 mutante: cygnus - starting salt-minion
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:35 logmsgbot: yurik@tin Synchronized php-1.27.0-wmf.9/extensions/Graph/modules/graph2.js: https://gerrit.wikimedia.org/r/#/c/261200/ (duration: 00m 31s)
* 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:22 ejegg: updated DjangoBannerStats from 8d4a9062aab80e5371faebadd72fbe4f19ac2fdd to a64fe0e373a978d3df0b7f1dd74ac4cc5c78d34e
* 21:13 thcipriani@deploy1002: Finished scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]] (duration: 17m 29s)
* 18:46 jynus: importing wikishared from x1-master into analytics-slave and setting up replication
* 21:01 inflatador: looking in to wdqs1006 alert ^^
* 18:28 jynus: restarting and upgrading db1050, using the fact that it is depooled
* 20:56 thcipriani@deploy1002: Started scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]]
* 17:16 paravoid: disabled varnish TBF and force-ran puppet on all cp* hosts (I12ea52165e125aaf4ed779399f34cff16d5cd140)
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:38 jynus: applying production-side replication filters for wikimania2017wiki on labs
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:24 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 30s)
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:14 logmsgbot: krenair@tin Synchronized dblists: (no message) (duration: 00m 29s)
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:13 logmsgbot: krenair@tin rebuilt wikiversions.php and synchronized wikiversions files: (no message)
* 20:49 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808064{{!}}Enable DiscussionTools topicsubscription, autotopicsub on testwiki (T310808)]] (duration: 03m 18s)
* 16:12 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/wikimania2017wiki.png: (no message) (duration: 00m 31s)
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dse-k8s-ctrl1001.eqiad.wmnet
* 16:12 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/260521/ (duration: 00m 30s)
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 14:55 jynus: cloning db1050's mysql data to db1022
* 20:48 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 14:15 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Emergency depool of db1050 (duration: 00m 31s)
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Dec 28 02:30:47 UTC 2015 (duration 6m 58s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 55s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806847{{!}}ukwikibooks: Add NS102 (Рецепт) to wgContentNamespaces (T310940)]] (duration: 03m 41s)
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 mutante: cumin -b 15 -p 95 'mw1*' 'run-puppet-agent -q --failed-only'
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 mutante: cumin -b 15 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 20:09 mutante: cumin -b 15 -p 95 'parse*' 'run-puppet-agent -q --failed-only'
* 20:07 mutante: cumin -b 15 -p 95 'wtp*' 'run-puppet-agent -q --failed-only'
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:39 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:34 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:24 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:21 ejegg: fundraising python tools updated from {{Gerrit|40d376d4}} to {{Gerrit|acf89fb2}}
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:38 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 18:01 brennen: train 1.39.0-wmf.17 ([[phab:T308070|T308070]]): no current blockers - rolling to all wikis
* 18:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:53 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:32 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:00 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:17 hashar: Upgrading CI Jenkins # [[phab:T311174|T311174]]
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:807902{{!}}Do not re-use "wikibase_config" for registering the language selector... (T307869)]] (duration: 03m 22s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30042 and previous config saved to /var/cache/conftool/dbconfig/20220623-150954-root.json
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30041 and previous config saved to /var/cache/conftool/dbconfig/20220623-150951-root.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30040 and previous config saved to /var/cache/conftool/dbconfig/20220623-150422-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30039 and previous config saved to /var/cache/conftool/dbconfig/20220623-145450-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30038 and previous config saved to /var/cache/conftool/dbconfig/20220623-145448-root.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30037 and previous config saved to /var/cache/conftool/dbconfig/20220623-144918-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30036 and previous config saved to /var/cache/conftool/dbconfig/20220623-143946-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30035 and previous config saved to /var/cache/conftool/dbconfig/20220623-143944-root.json
* 14:34 papaul: on going PDU maintenance in rack A3 codfw
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30034 and previous config saved to /var/cache/conftool/dbconfig/20220623-143414-root.json
* 14:31 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:30 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30033 and previous config saved to /var/cache/conftool/dbconfig/20220623-142443-root.json
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30032 and previous config saved to /var/cache/conftool/dbconfig/20220623-142440-root.json
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30031 and previous config saved to /var/cache/conftool/dbconfig/20220623-141910-root.json
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 taavi@deploy1002: Synchronized php-1.39.0-wmf.17/includes/skins/Skin.php: Backport: [[gerrit:807900{{!}}Skin: Change viewport based on feedback (T311119)]] (duration: 03m 29s)
* 14:10 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30030 and previous config saved to /var/cache/conftool/dbconfig/20220623-140939-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30029 and previous config saved to /var/cache/conftool/dbconfig/20220623-140936-root.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30028 and previous config saved to /var/cache/conftool/dbconfig/20220623-140406-root.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:00 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:00 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 13:58 moritzm: import jenkins 2.346.1 to thirdparty/ci [[phab:T311174|T311174]]
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30027 and previous config saved to /var/cache/conftool/dbconfig/20220623-135435-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30026 and previous config saved to /var/cache/conftool/dbconfig/20220623-135432-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30025 and previous config saved to /var/cache/conftool/dbconfig/20220623-134902-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30024 and previous config saved to /var/cache/conftool/dbconfig/20220623-133931-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30023 and previous config saved to /var/cache/conftool/dbconfig/20220623-133928-root.json
* 13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (2/2) (duration: 03m 26s)
* 13:34 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (1/2) (duration: 03m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30022 and previous config saved to /var/cache/conftool/dbconfig/20220623-133358-root.json
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182 db1184 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30021 and previous config saved to /var/cache/conftool/dbconfig/20220623-132951-root.json
* 13:27 sukhe: disable puppet on A:durum or A:wikidough or A:centrallog or A:dns-rec: deploying [[phab:T310574|T310574]]
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30020 and previous config saved to /var/cache/conftool/dbconfig/20220623-132729-root.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30019 and previous config saved to /var/cache/conftool/dbconfig/20220623-132133-root.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30018 and previous config saved to /var/cache/conftool/dbconfig/20220623-132128-root.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807050{{!}}[ImageSuggestions] Enable extension on ptwiki, ruwiki & idwiki (T302711)]] (duration: 03m 44s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30017 and previous config saved to /var/cache/conftool/dbconfig/20220623-130629-root.json
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30016 and previous config saved to /var/cache/conftool/dbconfig/20220623-130624-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30015 and previous config saved to /var/cache/conftool/dbconfig/20220623-125553-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30014 and previous config saved to /var/cache/conftool/dbconfig/20220623-125547-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30013 and previous config saved to /var/cache/conftool/dbconfig/20220623-125125-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30012 and previous config saved to /var/cache/conftool/dbconfig/20220623-125120-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30011 and previous config saved to /var/cache/conftool/dbconfig/20220623-124049-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30010 and previous config saved to /var/cache/conftool/dbconfig/20220623-124043-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30009 and previous config saved to /var/cache/conftool/dbconfig/20220623-123621-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30008 and previous config saved to /var/cache/conftool/dbconfig/20220623-123616-root.json
* 12:26 moritzm: installing waitress security updates
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30007 and previous config saved to /var/cache/conftool/dbconfig/20220623-122545-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30006 and previous config saved to /var/cache/conftool/dbconfig/20220623-122539-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30005 and previous config saved to /var/cache/conftool/dbconfig/20220623-122118-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30004 and previous config saved to /var/cache/conftool/dbconfig/20220623-122112-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30003 and previous config saved to /var/cache/conftool/dbconfig/20220623-121041-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30002 and previous config saved to /var/cache/conftool/dbconfig/20220623-121035-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30001 and previous config saved to /var/cache/conftool/dbconfig/20220623-120614-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30000 and previous config saved to /var/cache/conftool/dbconfig/20220623-120608-root.json
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29999 and previous config saved to /var/cache/conftool/dbconfig/20220623-115537-root.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29998 and previous config saved to /var/cache/conftool/dbconfig/20220623-115532-root.json
* 11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29997 and previous config saved to /var/cache/conftool/dbconfig/20220623-115110-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29996 and previous config saved to /var/cache/conftool/dbconfig/20220623-115104-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128 db1129 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29995 and previous config saved to /var/cache/conftool/dbconfig/20220623-114159-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29994 and previous config saved to /var/cache/conftool/dbconfig/20220623-114033-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29993 and previous config saved to /var/cache/conftool/dbconfig/20220623-114028-root.json
* 11:32 kart_: Updated cxserver to 2022-06-23-052732-production ([[phab:T311196|T311196]])
* 11:31 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:31 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:30 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29992 and previous config saved to /var/cache/conftool/dbconfig/20220623-112529-root.json
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29991 and previous config saved to /var/cache/conftool/dbconfig/20220623-112524-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 es1024 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29990 and previous config saved to /var/cache/conftool/dbconfig/20220623-110804-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29989 and previous config saved to /var/cache/conftool/dbconfig/20220623-105333-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29988 and previous config saved to /var/cache/conftool/dbconfig/20220623-105326-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29987 and previous config saved to /var/cache/conftool/dbconfig/20220623-105320-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29986 and previous config saved to /var/cache/conftool/dbconfig/20220623-103829-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29985 and previous config saved to /var/cache/conftool/dbconfig/20220623-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29984 and previous config saved to /var/cache/conftool/dbconfig/20220623-103816-root.json
* 10:25 jayme: running restart-php7.2-fpm A:parsoid or A:mw or A:mw-api to disable opcache revalidation - [[phab:T266055|T266055]]
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29983 and previous config saved to /var/cache/conftool/dbconfig/20220623-102325-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29982 and previous config saved to /var/cache/conftool/dbconfig/20220623-102318-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29981 and previous config saved to /var/cache/conftool/dbconfig/20220623-102312-root.json
* 10:21 XioNoX: fix eqiad lvs switch port MTU
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29980 and previous config saved to /var/cache/conftool/dbconfig/20220623-100822-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29979 and previous config saved to /var/cache/conftool/dbconfig/20220623-100815-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29978 and previous config saved to /var/cache/conftool/dbconfig/20220623-100808-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29977 and previous config saved to /var/cache/conftool/dbconfig/20220623-095318-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29976 and previous config saved to /var/cache/conftool/dbconfig/20220623-095311-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29975 and previous config saved to /var/cache/conftool/dbconfig/20220623-095304-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29973 and previous config saved to /var/cache/conftool/dbconfig/20220623-093814-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29972 and previous config saved to /var/cache/conftool/dbconfig/20220623-093807-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29971 and previous config saved to /var/cache/conftool/dbconfig/20220623-093800-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29970 and previous config saved to /var/cache/conftool/dbconfig/20220623-092310-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29969 and previous config saved to /var/cache/conftool/dbconfig/20220623-092303-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29968 and previous config saved to /var/cache/conftool/dbconfig/20220623-092256-root.json
* 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 db1179 db1180 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29967 and previous config saved to /var/cache/conftool/dbconfig/20220623-090842-root.json
* 09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:52 joal@deploy1002: Finished deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs (duration: 00m 08s)
* 08:52 joal@deploy1002: Started deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs
* 08:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 7 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 7 hosts with reason: Reboots
* 07:39 moritzm: installing firejail security updates
* 07:36 TheresNoTime: UTC morning deploys done
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806365{{!}}GrowthExperiments: Enable link recommendations frontend, round 4 (T304548)]] (duration: 03m 37s)
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Reboots
* 00:35 brennen: end of phabricator maintenance window
* 00:13 brennen: phabricator deploy finished ([[phab:T311175|T311175]])
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance


== 2015-12-27 ==
== 2022-06-22 ==
* 03:05 YuviPanda: run nodetool clearsnapshot -- v3 and nodetool clearsnapshot -- v1 on maps-test2001
* 22:56 tzatziki: removing 1 file for legal compliance
* 02:45 YuviPanda: run drop keyspace v3; on csql on maps-test1001 for yurik
* 21:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
* 02:42 YuviPanda: run drop keyspace v1; on csql on maps-test1001 for yurik
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 to resolve Old GC Hell alert
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Dec 27 02:30:29 UTC 2015 (duration 6m 59s)
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad to resolve Old GC Hell alert
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 35s)
* 21:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1006.eqiad.wmnet with OS bullseye
* 20:49 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44] (duration: 01m 18s)
* 20:48 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44]
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 20:27 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 20:24 cjming: end of UTC late backport window
* 20:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 20:19 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44] (duration: 07m 36s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807593{{!}}gawiki: Change category collation from `uppercase` to `uca-ga-u-kn` (T311136)]] (duration: 03m 39s)
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS bullseye
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44]
* 20:11 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44] (duration: 00m 07s)
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44]
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44] (duration: 06m 16s)
* 20:03 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44]
* 20:03 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44] (duration: 30m 58s)
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 19:42 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection (duration: 02m 03s)
* 19:37 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection
* 19:32 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44]
* 19:31 aqu: Deploying analytics/refinery (weekly train)
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 19:14 herron: bounced apache on lists1001
* 19:06 hashar: Restarting CI Jenkins
* 16:46 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1009.eqiad.wmnet with OS bullseye
* 16:45 hashar: Restarting CI Jenkins
* 16:43 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
* 16:33 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:29 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:18 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:14 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 16:13 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 16:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 16:08 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 16:06 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 16:05 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 16:04 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 15:36 moritzm: upload jenkins 2.332.4 to apt.wikimedia.org [[phab:T311068|T311068]]
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 15:00 jayme: published docker-registry.discovery.wmnet/helm-state-metrics:0.1.0-1 - [[phab:T310714|T310714]]
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
* 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 Lucas_WMDE: UTC afternoon backport+config window done
* 14:09 lucaswerkmeister-wmde@deploy1002: Synchronized logos/manage.py: Config: [[gerrit:807486{{!}}logos: Update phpcs comment]] (should be a no-op but syncing just in case) (duration: 03m 19s)
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 14:01 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' specieswiki<nowiki>{</nowiki>,-<nowiki>{</nowiki>1.5,2<nowiki>}</nowiki>x<nowiki>}</nowiki>.png {{!}} mwscript purgeList.php # [[phab:T310961|T310961]]
* 14:01 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (3/3) (duration: 03m 30s)
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (2/3) (duration: 03m 29s)
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:56 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 13:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (1/3) (duration: 03m 46s)
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (2/2) (duration: 03m 39s)
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (1/2) (duration: 03m 35s)
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:28 XioNoX: fix MTU on eqiad server facing switch ports
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:807255{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (2/3) (T304328)]] (duration: 03m 35s)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807254{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (1/3) (T304328)]] (duration: 03m 35s)
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 13:10 XioNoX: fix MTU in drmrs
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:807211{{!}}[wmf-config]: Deploy GDI Survey Wave 2 - BETA (T311079)]] (duration: 03m 29s)
* 12:58 XioNoX: fix MTU on codfw switches access ports
* 12:57 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 12:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 12:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 12:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 12:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 12:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 12:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 12:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 12:18 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 12:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 12:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 12:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 12:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 11:46 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 11:41 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 11:11 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 20s)
* 11:10 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:09 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 11s)
* 11:08 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:07 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 02m 54s)
* 11:05 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 10:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
* 10:53 jayme: systemctl restart rsyslog on kubernetes2008
* 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
* 10:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
* 10:41 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
* 10:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
* 10:36 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
* 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
* 10:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
* 10:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
* 10:17 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
* 10:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
* 10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 10:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
* 10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2003.codfw.wmnet
* 10:04 moritzm: installing vim security updates
* 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 09:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
* 09:35 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:35 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 09:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
* 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 08:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
* 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29964 and previous config saved to /var/cache/conftool/dbconfig/20220622-084234-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29963 and previous config saved to /var/cache/conftool/dbconfig/20220622-084225-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29962 and previous config saved to /var/cache/conftool/dbconfig/20220622-084206-root.json
* 08:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29961 and previous config saved to /var/cache/conftool/dbconfig/20220622-082730-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29960 and previous config saved to /var/cache/conftool/dbconfig/20220622-082721-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29959 and previous config saved to /var/cache/conftool/dbconfig/20220622-082702-root.json
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 08:18 marostegui: Upgrade kernel and reboot on db[1111,1132,1143,1127].eqiad.wmnet
* 08:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 08:15 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]] (duration: 03m 43s)
* 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29957 and previous config saved to /var/cache/conftool/dbconfig/20220622-081227-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29956 and previous config saved to /var/cache/conftool/dbconfig/20220622-081217-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29955 and previous config saved to /var/cache/conftool/dbconfig/20220622-081159-root.json
* 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 08:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
* 08:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
* 08:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
* 08:04 hashar: Updating operations-puppet-tests-buster-docker Jenkins job to use the latest Docker image (rebuild to catch up with latest defined gems). https://gerrit.wikimedia.org/r/c/integration/config/+/807478
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29954 and previous config saved to /var/cache/conftool/dbconfig/20220622-075721-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29953 and previous config saved to /var/cache/conftool/dbconfig/20220622-075713-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29952 and previous config saved to /var/cache/conftool/dbconfig/20220622-075655-root.json
* 07:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 07:53 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
* 07:50 marostegui: Upgrade kernel and reboot on db[2145-2150].codfw.wmnet
* 07:49 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29951 and previous config saved to /var/cache/conftool/dbconfig/20220622-074217-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29950 and previous config saved to /var/cache/conftool/dbconfig/20220622-074209-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29949 and previous config saved to /var/cache/conftool/dbconfig/20220622-074151-root.json
* 07:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 07:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29948 and previous config saved to /var/cache/conftool/dbconfig/20220622-072714-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29947 and previous config saved to /var/cache/conftool/dbconfig/20220622-072705-root.json
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29946 and previous config saved to /var/cache/conftool/dbconfig/20220622-072647-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29945 and previous config saved to /var/cache/conftool/dbconfig/20220622-071210-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29944 and previous config saved to /var/cache/conftool/dbconfig/20220622-071201-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29943 and previous config saved to /var/cache/conftool/dbconfig/20220622-071143-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 es1026 es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29942 and previous config saved to /var/cache/conftool/dbconfig/20220622-065507-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Switchover es1, es2 and es3 masters', diff saved to https://phabricator.wikimedia.org/P29941 and previous config saved to /var/cache/conftool/dbconfig/20220622-065208-marostegui.json
* 05:52 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 01:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:17 tstarling@deploy1002: Synchronized wmf-config/mc-labs.php: for completeness (duration: 03m 41s)
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:13 tstarling@deploy1002: Synchronized wmf-config/mc.php: g 807158 [[phab:T278392|T278392]] (duration: 03m 35s)
* 01:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2015-12-26 ==
== 2022-06-21 ==
* 19:12 paravoid: restarting varnish-frontend on cp3042
* 20:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b42e57d75ec6b0536493fa073805a0bcb066aef1}}: zhwikiquote: Disable local upload ([[phab:T311017|T311017]]) (duration: 03m 43s)
* 19:06 jynus: setting db1030 as the new master of db2028
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Emergency depool of db1022 (duration: 00m 30s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:28 jynus: disabling lag notifications for codfw (s6)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:01 paravoid: cp3048: cleaned up /run/vmod_tbf/tbf.db/, kept a backup copy under ~faidon
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:55 paravoid: cp3048: service varnish-frontend stop (sending 429 to lots of people, T122453)
* 20:22 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (2/2) (duration: 03m 28s)
* 05:38 paravoid: rolling restart of hhvm jobrunners (T122069)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Dec 26 02:31:39 UTC 2015 (duration 6m 54s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 39s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (1/2) (duration: 03m 30s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f70e302e11756d9704acc86c45b3d7aabf31c4d}}: fawiktionary: Enable SandboxLink extension ([[phab:T308505|T308505]]) (duration: 03m 37s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:38 dancy@deploy1002: backport aborted:  (duration: 00m 10s)
* 19:38 dancy@deploy1002: Installation of scap version "4.9.5" completed for 558 hosts
* 19:38 dancy@deploy1002: Installing scap version "4.9.5" for 558 hosts
* 19:22 urandom: replicating Cassandra `system_auth` keyspace to codfw -- [[phab:T307641|T307641]]
* 18:56 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2` failed due to syntax error, patch here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/807200
* 18:48 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2`
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1001.wikimedia.org
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 17:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp1001.wikimedia.org
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2001.wikimedia.org
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host elastic1049.eqiad.wmnet
* 17:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 17:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp2001.wikimedia.org
* 17:14 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 17:09 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1049.eqiad.wmnet
* 17:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 17:01 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 16:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 16:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 16:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
* 15:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
* 15:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 15:55 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2048.codfw.wmnet
* 15:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
* 15:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
* 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 15:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 15:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (2/2) (duration: 03m 28s)
* 15:37 klausman: restarting pybal on lvs2009
* 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
* 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 15:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (1/2) (duration: 03m 51s)
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:30 klausman: Restarting pybal on lvs2010
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2001.codfw.wmnet
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging-ctrl2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:16 klausman@cumin1001: conftool action : help; selector: name=ml-staging2001
* 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:06 moritzm: installing avahi security updates
* 15:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:01 papaul: PDU swap for rack a2 complete
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:24 papaul: on going maintenance on ps1-a2-codfw
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 13:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
* 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
* 13:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
* 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
* 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:28 daniel@deploy1002: Synchronized rpc/: Config: [[gerrit:805775{{!}}rpc: Remove unused RunJobs.php (T175146 T243096)]] (duration: 03m 45s)
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
* 13:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 13:05 moritzm: installing Linux 5.10.120-1~bpo10+1 on buster hosts with backports kernel
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 13:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
* 12:56 moritzm: installing haproxy security updates on stretch
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 12:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 12:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
* 12:43 moritzm: installing python-bottle security updates
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 12:25 moritzm: reset logster-csp/logster-badpass-priv on mwlog1002, these were removed from Puppet
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:59 mbsantos: mbsantos@maps2009 imposm-removebackup-import ([[phab:T305845|T305845]])
* 11:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:43 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for testing', diff saved to https://phabricator.wikimedia.org/P29936 and previous config saved to /var/cache/conftool/dbconfig/20220621-114232-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for testing', diff saved to https://phabricator.wikimedia.org/P29935 and previous config saved to /var/cache/conftool/dbconfig/20220621-114216-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for testing', diff saved to https://phabricator.wikimedia.org/P29934 and previous config saved to /var/cache/conftool/dbconfig/20220621-114151-root.json
* 10:57 volans: deleting netbox getstats.GetDeviceStats job results - [[phab:T311048|T311048]]
* 10:51 kart_: Updated cxserver to 2022-06-21-035954-production ([[phab:T307970|T307970]])
* 10:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 10:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 10:47 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 10:47 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 10:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 10:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:31 urbanecm: 09:29:23 Synchronized wmf-config/throttle.php: {{Gerrit|7c9f6a561b2b4b5c5db063bad83bd23e9cbac347}}: Add a throttle rule for a Czech course ([[phab:T310885|T310885]]) (duration: 03m 34s) #manually logging in logmsgbot's absence
* 09:20 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 09:13 marostegui: dbmaint s8@codfw [[phab:T310011|T310011]]
* 08:29 marostegui: Reboot db1120 for kernel upgrade
* 08:14 moritzm: remove EOLed parsoid debs from releases.wikimedia.org [[phab:T309765|T309765]]
* 05:54 marostegui: Reboot db1132 and db1181 for kernel upgrade


== 2015-12-25 ==
== 2022-06-20 ==
* 15:50 jynus: testing new mariadb packages on db2070
* 07:14 SandraEbele: Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics).
* 13:19 jynus: setting db2018's binlog_format as MIXED
* 07:14 SandraEbele: killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs.
* 10:51 jynus: powercycle cp3010
* 09:17 jynus: powercycling cp4007 (unresponsive to ssh, ping, serial console)
* 02:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Dec 25 02:29:49 UTC 2015 (duration 6m 56s)
* 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 47s)


== 2015-12-24 ==
== 2022-06-19 ==
* 23:28 mutante: powercycled mw1114
* 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:27 mutante: i just reset dra on mw1114 because it said it was in use and i didnt see a log yet :;p
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:26 robh: mw1114 spammed all icinga errors, system is outputting endless scroll of login prompt, not halting for input (like anohter session or crash cart is sending it, or an error)
* 10:14 ayounsi@cumin1001: dbctl commit (dc=all): 'depool', diff saved to https://phabricator.wikimedia.org/P29910 and previous config saved to /var/cache/conftool/dbconfig/20220619-101436-ayounsi.json
* 17:31 gwicke: aqs: tweaked table properties for local_group_default_T_pageviews_per_article_flat: 2 months max DTCS window size, deflate compression
* 17:20 jynus: restarting and reconfiguring mysql at db2066
* 16:45 jynus: restart and reconfigure mysql at db2059
* 16:11 jynus: restart and mysql reconfguration of db2052
* 15:02 jynus: restarting and reconfiguring mysql at db2045
* 14:11 paravoid: powercycling mw1012, OOM'ed/stuck
* 14:09 paravoid: rolling restart of hhvm jobrunners (T122069)
* 14:06 jynus: restart and reconfigure mysql at db2038
* 12:32 jynus: restart and reconfigure mysql at db2065
* 12:12 jynus: restart and reconfiguring mysql for db2058
* 11:50 jynus: restarting and reconfiguring mysql at db2051
* 11:28 jynus: restarting and reconfiguring mysql at db2044
* 10:33 jynus: restarting 's2' replication on dbstore200[12] after cloning
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Dec 24 02:30:40 UTC 2015 (duration 6m 52s)
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 03s)
* 01:05 awight: update payments from bae4d02afd8cfe1f8b8617c2f74bb36e420d281d to a7785baa7b40b442ecf0b60d47572502d0759780
* 00:38 gwicke: restbase1003: starting `nodetool cleanup`


== 2015-12-23 ==
== 2022-06-17 ==
* 23:31 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.9/extensions/Graph/modules/ve-graph: https://gerrit.wikimedia.org/r/#/c/260868/ (duration: 00m 31s)
* 22:05 AndyRussG: update payments-wiki revision {{Gerrit|10304f69}} -> {{Gerrit|ef53c82e}}
* 19:59 mutante: restbase1004 - puppet stopped and host key changed, what's up?
* 20:22 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P29908 and previous config saved to /var/cache/conftool/dbconfig/20220617-202240-jynus.json
* 19:42 mutante: ran puppet on mw2112
* 20:20 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P29907 and previous config saved to /var/cache/conftool/dbconfig/20220617-202038-jynus.json
* 19:41 mutante: logstash1002 - started logstash service
* 17:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS buster
* 19:22 logmsgbot: aaron@tin Synchronized wmf-config/CommonSettings.php: Remove unused $wgMaxSquidPurgeTitles setting (duration: 00m 30s)
* 17:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 18:55 ejegg: updated fundraising tools from ebed29c0eccf38c812b20a957b3487a15bfa9cbc to 1bc23cb4bfaf2a9d4d215aad79dd67d891b5d973
* 17:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 18:51 ejegg: updated fundraising dashboard from 59e51c4ff74c3c584daf6c5de3bb66daa764cd28 to af8a493ab9ac5431e0d294e5019ac4e426ac6e08
* 16:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS buster
* 18:38 jynus: restart and reconfigure mysql at db2037
* 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 18:17 mutante: bohrium - finish install, signing puppet certs
* 16:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS buster
* 17:11 jynus: cloning s2 databases from dbstore2001 to dbstore2002 (s2 replication disabled on both)
* 16:37 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 14:05 jynus: restart and reconfigure mysql at db2064
* 16:35 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 13:26 jynus: restart and reconfigure mysql at db2063
* 16:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 12:41 jynus: reenabling event scheduler on db1046 (eventlogging m4-master)
* 16:34 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 12:30 jynus: restart and reconfigure mysql at db2056
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 12:14 godog: upgrade cassandra on aqs1003
* 16:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 11:24 jynus: reloading and reconfiguring mysql on db2049
* 16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:11 godog: roll-upgrade cassandra to 2.1.12 on aqs100[123]
* 16:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:55 jynus: rebooting and reconfiguring mysql on db2041
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 10:11 jynus: restarting and reconfiguring mysql at db2035
* 16:22 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 09:52 gwicke: rebuilding restbase1004
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 09:51 gwicke: wiped & started boostrap on restbase1008
* 16:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 09:18 gwicke: nodetool removenode e2813bb9-f1f2-4d21-ac19-95a7a35b4513 in preparation for adding 1004 to the cluster without bootstrap
* 16:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Dec 23 02:30:25 UTC 2015 (duration 7m 1s)
* 16:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 18s)
* 16:06 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 00:40 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/260696 & https://gerrit.wikimedia.org/r/260699 (duration: 05m 28s)
* 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
* 00:37 mutante: mw1133 - powercycle
* 16:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 00:36 legoktm: manually fixed up stuck global rename of "RCJU-ArCJ" -> "Archives cantonales jurassiennes"
* 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 00:31 matt_flaschen: Ran UPDATE flow_workflow SET workflow_page_id = 41854369 WHERE workflow_wiki = 'enwiki' AND workflow_namespace = 5 AND workflow_title_text = 'Flow/Developer_test_page' AND workflow_page_id = 48099373; to work around DB inconsistency (T117812)
* 15:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 15:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 15:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 15:46 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS buster
* 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 15:39 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 15:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 15:32 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS buster
* 15:29 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 15:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 15:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 15:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 15:18 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:15 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS buster
* 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 15:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 15:02 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 14:59 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 14:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 14:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 12:35 SandraEbele: deployed daily airflow dag for 3 Wikidata metrics.
* 11:54 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@18182aa]: (no justification provided) (duration: 00m 13s)
* 11:54 ebysans@deploy1002: Started deploy [airflow-dags/analytics@18182aa]: (no justification provided)
* 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2012.codfw.wmnet
* 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2012.codfw.wmnet
* 11:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2011.codfw.wmnet
* 11:40 moritzm: upload cas 6.5.5+wmf11u1 to apt.wikimedia.org [[phab:T305518|T305518]]
* 11:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2011.codfw.wmnet
* 11:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2010.codfw.wmnet
* 11:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 11:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 11:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 11:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 11:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2010.codfw.wmnet
* 11:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
* 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
* 11:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1010.eqiad.wmnet
* 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 10:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 10:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 10:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 10:34 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
* 09:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
* 09:56 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 09:56 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 09:55 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 09:55 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 09:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
* 09:44 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
* 09:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
* 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1004.eqiad.wmnet
* 09:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
* 09:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
* 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
* 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
* 09:11 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 09:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 08:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 08:51 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 08:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 08:39 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 08:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
* 08:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
* 08:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
* 08:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
* 07:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 07:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 02:51 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
* 02:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:36 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 03m 43s)
* 02:02 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
* 01:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
* 01:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
* 00:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 00:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye


== 2015-12-22 ==
== 2022-06-16 ==
* 21:46 gwicke: restbase1004: tune2fs -m 0 /dev/mapper/restbase1004--vg-srv
* 23:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 21:45 gwicke: restbase1004: restarted bootstrap
* 23:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 21:22 gwicke: restbase1003: restarting cassandra to clear up disk space from old stream
* 23:38 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 21:11 gwicke: restbase1008: restarting cassandra to clear up disk space from old stream
* 23:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 18:36 robh: silver returned to normal service, wikitech.w.o certificate renewed.
* 22:59 mutante: new Wikipedia languages added to DNS:  blk = https://en.wikipedia.org/wiki/Pa%27O_language  {{!}}  pcm = https://en.wikipedia.org/wiki/Nigerian_Pidgin
* 18:26 robh: silver puppet staying stalled during toollabs issue (we dont want to rehup silver web serivce)
* 22:37 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:17 robh: puppet disabled on silver, going to update wikitech.wikimedia.org certificate
* 22:33 volans@cumin2002: START - Cookbook sre.dns.netbox
* 18:10 jynus: disabling event scheduling on db1046
* 21:18 thcipriani@deploy1002: Finished scap: noop test (duration: 04m 07s)
* 18:03 jynus: rolling schema change (ALTER TABLE ENGINE=TokuDB) on m4-master (db1046) log (eventlogging)
* 21:14 thcipriani@deploy1002: Started scap: noop test
* 16:44 godog: bounce cassandra on restbase1004, restart bootstrap
* 21:10 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:805433{{!}}CommonSettings: clean up and simplify some code]] (duration: 03m 42s)
* 16:42 mutante: powercycling crashed mw1144
* 21:06 thcipriani@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:806249{{!}}MWRealm.php: remove unused getRealmSpecificFilename() (T171115)]] (duration: 03m 35s)
* 16:41 jynus: converting dbstore2001 (delayed slave) into an actual delayed slave, adding redundancy to dbstore1002
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:40 godog: bounce cassandra on restbase1003
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:15 akosiaris: upgrade cassandra on maps-test2001
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:15 akosiaris: upgrade cassandra on maps-test2002
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:53 mutante: kafka1001,1002 - crit - eventlogging not running (?)
* 20:59 thcipriani@deploy1002: Finished scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]] (duration: 16m 57s)
* 15:52 mutante: restbase1003 - disk space, restbase1008 - disk space, restbase1004 - cassandra cql refused
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:23 akosiaris: upgrade cassandra on maps-test2003
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:06 jynus: restarting and reconfiguring mysql at dbstore2001
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:06 mutante: labtestcontrol2001 - puppet had not been running for a while, a bunch of changes have been applied incl. keys and passwords
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:04 mutante: enabling puppet on labtestcontrol2001
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:04 akosiaris: upgraded cassandra on maps-test2004
* 20:42 thcipriani@deploy1002: Started scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]]
* 11:54 apergos: salt packages with wmf packages precise running on ms-{bf}e* in esams; trusty running on analytics103* in eqiad; jessie running on restbase2* in codfw
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:43 godog: restart cassandra bootstrap on restbase1004
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:09 jynus: online resizing /srv/postgres on labsdb1006 +100GB
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:06 hashar: Restarting Jenkins
* 20:27 cjming@deploy1002: Synchronized phpcs.xml: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 27s)
* 09:54 apergos: precise and trusty salt packages with wmf patches deployed manually on dataset1001 and analytics1001, seem to work fine
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:42 jynus: restarting and reconfiguring mysql at db2036
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Dec 22 02:30:28 UTC 2015 (duration 6m 54s)
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 47s)
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:29 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.9/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/260492/ (duration: 00m 32s)
* 20:23 cjming@deploy1002: Synchronized wmf-config/: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 22s)
* 00:22 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.9/extensions/SyntaxHighlight_GeSHi/modules/ve-syntaxhighlight/ve.ui.MWSyntaxHighlightDialogTool.js: https://gerrit.wikimedia.org/r/#/c/260429/ (duration: 00m 30s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805179{{!}}Turn off TOC A/B test for pilot wikis (T309683)]] (duration: 03m 37s)
* 19:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner2001.codfw.wmnet
* 19:39 aokoth@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:23 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 19:03 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner2001.codfw.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab-runner1001.eqiad.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29904 and previous config saved to /var/cache/conftool/dbconfig/20220616-185520-marostegui.json
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:54 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:50 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:49 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:44 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to all wikis
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:42 brennen@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/CheckUser/src/Hooks.php: Backport: [[gerrit:806246{{!}}Only try to create User object if username is not null (T310747)]] (duration: 03m 23s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29903 and previous config saved to /var/cache/conftool/dbconfig/20220616-184015-marostegui.json
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29902 and previous config saved to /var/cache/conftool/dbconfig/20220616-182510-marostegui.json
* 18:13 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:11 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29901 and previous config saved to /var/cache/conftool/dbconfig/20220616-181005-marostegui.json
* 18:10 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 17:59 brennen: end of phabricator deploy
* 17:46 brennen: starting phabricator deploy, momentary downtime expected while services restart
* 17:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29900 and previous config saved to /var/cache/conftool/dbconfig/20220616-173738-marostegui.json
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29899 and previous config saved to /var/cache/conftool/dbconfig/20220616-173725-marostegui.json
* 17:31 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:31 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:27 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:26 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29898 and previous config saved to /var/cache/conftool/dbconfig/20220616-172220-marostegui.json
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29897 and previous config saved to /var/cache/conftool/dbconfig/20220616-170715-marostegui.json
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29896 and previous config saved to /var/cache/conftool/dbconfig/20220616-165210-marostegui.json
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29895 and previous config saved to /var/cache/conftool/dbconfig/20220616-161844-marostegui.json
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29894 and previous config saved to /var/cache/conftool/dbconfig/20220616-161835-marostegui.json
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29893 and previous config saved to /var/cache/conftool/dbconfig/20220616-160330-marostegui.json
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29892 and previous config saved to /var/cache/conftool/dbconfig/20220616-154825-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29891 and previous config saved to /var/cache/conftool/dbconfig/20220616-153320-marostegui.json
* 15:31 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 15:30 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 15:29 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P29890 and previous config saved to /var/cache/conftool/dbconfig/20220616-151434-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P29889 and previous config saved to /var/cache/conftool/dbconfig/20220616-145931-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29888 and previous config saved to /var/cache/conftool/dbconfig/20220616-145136-marostegui.json
* 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29887 and previous config saved to /var/cache/conftool/dbconfig/20220616-145128-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P29886 and previous config saved to /var/cache/conftool/dbconfig/20220616-144427-ladsgroup.json
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29885 and previous config saved to /var/cache/conftool/dbconfig/20220616-143623-marostegui.json
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P29884 and previous config saved to /var/cache/conftool/dbconfig/20220616-142923-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29883 and previous config saved to /var/cache/conftool/dbconfig/20220616-142118-marostegui.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29882 and previous config saved to /var/cache/conftool/dbconfig/20220616-140613-marostegui.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29881 and previous config saved to /var/cache/conftool/dbconfig/20220616-140453-root.json
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 volans@cumin1001: dbctl commit (dc=all): 'Doesn't have new wikiuser', diff saved to https://phabricator.wikimedia.org/P29880 and previous config saved to /var/cache/conftool/dbconfig/20220616-140107-volans.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29879 and previous config saved to /var/cache/conftool/dbconfig/20220616-134950-root.json
* 13:45 sukhe: upload bird2_2.0.7-4.1wm1 to apt.wm.o (buster) - [[phab:T310574|T310574]]
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29878 and previous config saved to /var/cache/conftool/dbconfig/20220616-133446-root.json
* 13:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1089.eqiad.wmnet
* 13:22 jayme@cumin1001: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 13:21 jayme@cumin1001: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29877 and previous config saved to /var/cache/conftool/dbconfig/20220616-131942-root.json
* 13:10 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29876 and previous config saved to /var/cache/conftool/dbconfig/20220616-130438-root.json
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 13:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29875 and previous config saved to /var/cache/conftool/dbconfig/20220616-123357-marostegui.json
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1008.eqiad.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 for schema change', diff saved to https://phabricator.wikimedia.org/P29874 and previous config saved to /var/cache/conftool/dbconfig/20220616-115924-root.json
* 11:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1008.eqiad.wmnet
* 11:53 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet
* 11:45 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet
* 11:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
* 11:38 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
* 11:35 godog: trim swift logs older than 25d from centrallog hosts - [[phab:T309171|T309171]]
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet
* 11:27 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet
* 11:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
* 11:19 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29873 and previous config saved to /var/cache/conftool/dbconfig/20220616-111632-marostegui.json
* 11:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
* 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 11:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json
* 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
* 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
* 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
* 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json
* 10:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster
* 10:02 elukey: ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster
* 09:21 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json
* 09:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:02 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 08:45 moritzm: failover ganeti master in drmrs/2 to ganeti6004
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370{{!}}testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 joal: Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure


== 2015-12-21 ==
== 2022-06-15 ==
* 22:41 cwd: updated paymentswiki from a1be1ad134d06464e98de180227554fceddc91d4 to bae4d02afd8cfe1f8b8617c2f74bb36e420d281d
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json
* 20:49 godog: restbase1004 bootstrap failed, restbase1007-a is down java.lang.RuntimeException: A node required to move the data consistently is down (/10.64.0.230).
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json
* 19:27 legoktm: running checkLocalUser.php --delete=1 for real this time on terbium
* 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS buster
* 19:22 godog: reimage restbase1004
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29865 and previous config saved to /var/cache/conftool/dbconfig/20220615-221834-marostegui.json
* 19:14 paravoid: powercycling mw1011
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS buster
* 19:11 paravoid: rolling restart of hhvm on the eqiad jobrunners
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 18:47 jynus: common-sync: Copying to mw1016.eqiad.wmnet from tin.eqiad.wmnet
* 22:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 18:35 ori: correction: previous log message was for mw1015, not mw1017
* 22:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 18:27 ori: mw1017: enabled jemalloc profiling, restarted hhvm, now running hhvm-collect-heaps
* 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 17:48 akosiaris: restarted hhvm on mw1012.eqiad.wmnet
* 22:12 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 16:57 thcipriani: timeout on sync-file to mw1016.eqiad.wmnet
* 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 16:56 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.9/extensions/Popups/Popups.hooks.php: SWAT: Use ExtensionRegistry to determine whether TextExtracts is installed [[gerrit:260346]] (duration: 02m 48s)
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29864 and previous config saved to /var/cache/conftool/dbconfig/20220615-220329-marostegui.json
* 16:34 jynus: sync-common to mw1085
* 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 16:26 jynus: powercycling mw1085.eqiad.wmnet
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 16:22 thcipriani: mw1085.eqiad.wmnet times out on SSH connection
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 16:19 godog: reboot restbase1007, load through the roof
* 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 16:18 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.9/extensions/CentralNotice/resources/subscribing/ext.centralNotice.geoIP.js: SWAT: Update CentralNotice [[gerrit:260316]] (duration: 03m 03s)
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29863 and previous config saved to /var/cache/conftool/dbconfig/20220615-213241-marostegui.json
* 16:08 godog: depool restbase1007
* 21:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 16:01 apergos: jessie packages for salt with local patches deployed on restbase1001, looks fine but just in case.  
* 21:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 15:44 godog: adding new 1TB disk to restbase1007
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29862 and previous config saved to /var/cache/conftool/dbconfig/20220615-213233-marostegui.json
* 14:22 andrewbogott: disabling puppet on labnet1002 for dnsmasq tests
* 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29861 and previous config saved to /var/cache/conftool/dbconfig/20220615-211728-marostegui.json
* 14:07 MaxSem: me and yurik are nuking old maps data and reimporting planet
* 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29860 and previous config saved to /var/cache/conftool/dbconfig/20220615-210223-marostegui.json
* 13:46 jynus: extending online s2-master data disk by +100GB
* 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29859 and previous config saved to /var/cache/conftool/dbconfig/20220615-204717-marostegui.json
* 13:15 akosiaris: disabled puppet on maps-test2001 and commented out osmupdater crontab entry until we fix the sync process
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:02 jynus: emergency restart of db1047's mysql
* 20:08 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804014{{!}}Remove unused setting wgQuickSurveysUseVue (T285890)]] (duration: 03m 38s)
* 09:54 jynus: reenabling semisync replication on s3
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:07 godog: stop cassandra on restbase1004, decomissioned
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Dec 21 02:29:51 UTC 2015 (duration 6m 47s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 45s)
* 19:50 hashar@deploy1002: Finished deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]] (duration: 00m 10s)
* 02:20 andrewbogott: disabling puppet on labnet1002 to mess with dnsmasq
* 19:50 hashar@deploy1002: Started deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]]
* 01:44 andrewbogott: disabled puppet on holmium and labservices1001 to control roll-out of https://gerrit.wikimedia.org/r/#/c/260037/
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29858 and previous config saved to /var/cache/conftool/dbconfig/20220615-194703-marostegui.json
* 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29857 and previous config saved to /var/cache/conftool/dbconfig/20220615-194655-marostegui.json
* 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29856 and previous config saved to /var/cache/conftool/dbconfig/20220615-193150-marostegui.json
* 19:31 hashar: wikibugs IRC bot has been restarted by valhallasw \o/ # [[phab:T310734|T310734]]
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29855 and previous config saved to /var/cache/conftool/dbconfig/20220615-191645-marostegui.json
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29854 and previous config saved to /var/cache/conftool/dbconfig/20220615-190140-marostegui.json
* 18:42 hashar: wikibugs (irc bot for Phabricator/Gerrit) is no more working and would need a restart [[phab:T310734|T310734]]
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29853 and previous config saved to /var/cache/conftool/dbconfig/20220615-182140-marostegui.json
* 18:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]] (duration: 03m 43s)
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1015.eqiad.wmnet with OS buster
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:52 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1014.eqiad.wmnet with OS buster
* 17:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29851 and previous config saved to /var/cache/conftool/dbconfig/20220615-172738-marostegui.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29849 and previous config saved to /var/cache/conftool/dbconfig/20220615-171233-marostegui.json
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29848 and previous config saved to /var/cache/conftool/dbconfig/20220615-165727-marostegui.json
* 16:54 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to group0
* 16:44 jynus: reestarting replication for m3 on db1117, not db2078
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29847 and previous config saved to /var/cache/conftool/dbconfig/20220615-164222-marostegui.json
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:29 brennen: phabricator upgrade finished
* 16:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Id8cdb8aef70f6672}} (duration: 03m 41s)
* 16:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host backup1009.eqiad.wmnet
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host backup1009.eqiad.wmnet
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29845 and previous config saved to /var/cache/conftool/dbconfig/20220615-160838-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29844 and previous config saved to /var/cache/conftool/dbconfig/20220615-160830-marostegui.json
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS buster
* 15:56 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P29843 and previous config saved to /var/cache/conftool/dbconfig/20220615-155325-marostegui.json
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 15:40 mutante: phabricator upgrade in progress
* 15:39 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:39 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20220615-153820-marostegui.json
* 15:35 brennen: starting phabricator deploy, momentary downtime expected while Apache restarts and migrations run
* 15:34 jynus: stopping replication for m3 on db1117, db2078
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29841 and previous config saved to /var/cache/conftool/dbconfig/20220615-152315-marostegui.json
* 15:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ms-be1059.eqiad.wmnet with OS bullseye
* 15:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:20 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:06 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:05 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:05 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:03 mutante: phabricator maintenance about to start
* 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 14:59 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:52 jbond@cumin1001: END (ERROR) - Cookbook sre.pdus.uptime (exit_code=97)
* 14:51 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29840 and previous config saved to /var/cache/conftool/dbconfig/20220615-145028-marostegui.json
* 14:50 urandom: ALTER-ing replication for codfw (Cassandra) expansion -- [[phab:T307641|T307641]]
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29839 and previous config saved to /var/cache/conftool/dbconfig/20220615-145020-marostegui.json
* 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:49 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:46 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29838 and previous config saved to /var/cache/conftool/dbconfig/20220615-143515-marostegui.json
* 14:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:30 hnowlan@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T308670|T308670]] credentials to access the similar-users service (duration: 03m 32s)
* 14:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:22 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:21 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29836 and previous config saved to /var/cache/conftool/dbconfig/20220615-142010-marostegui.json
* 14:19 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 14:16 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS buster
* 14:10 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jnuche@deploy1002: Installation of scap version "4.9.4" completed for 558 hosts
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 14:08 jnuche@deploy1002: Installing scap version "4.9.4" for 558 hosts
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29834 and previous config saved to /var/cache/conftool/dbconfig/20220615-140505-marostegui.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:01 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:58 awight: EU afternoon backport window complete.
* 13:57 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/Translate/src/PageTranslation/DeleteTranslatableBundleSpecialPage.php: Backport: [[gerrit:805749{{!}}Fix deletion of translation pages outside of NS_MAIN namespace (T310440)]] (duration: 00m 32s)
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29833 and previous config saved to /var/cache/conftool/dbconfig/20220615-135508-root.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29832 and previous config saved to /var/cache/conftool/dbconfig/20220615-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29831 and previous config saved to /var/cache/conftool/dbconfig/20220615-135458-root.json
* 13:54 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:53 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:49 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29830 and previous config saved to /var/cache/conftool/dbconfig/20220615-134004-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29829 and previous config saved to /var/cache/conftool/dbconfig/20220615-133958-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29828 and previous config saved to /var/cache/conftool/dbconfig/20220615-133954-root.json
* 13:38 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:805745{{!}}Restore internal mechanism to use either back or close button (T310602)]] (duration: 00m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29827 and previous config saved to /var/cache/conftool/dbconfig/20220615-133334-marostegui.json
* 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29826 and previous config saved to /var/cache/conftool/dbconfig/20220615-133326-marostegui.json
* 13:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 01m 08s)
* 13:30 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 02m 06s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29825 and previous config saved to /var/cache/conftool/dbconfig/20220615-132500-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29824 and previous config saved to /var/cache/conftool/dbconfig/20220615-132454-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29823 and previous config saved to /var/cache/conftool/dbconfig/20220615-132450-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29822 and previous config saved to /var/cache/conftool/dbconfig/20220615-131820-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29821 and previous config saved to /var/cache/conftool/dbconfig/20220615-130956-root.json
* 13:09 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29820 and previous config saved to /var/cache/conftool/dbconfig/20220615-130951-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29819 and previous config saved to /var/cache/conftool/dbconfig/20220615-130946-root.json
* 13:08 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:04 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 43s)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29818 and previous config saved to /var/cache/conftool/dbconfig/20220615-130315-marostegui.json
* 13:02 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 12:56 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 58s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:55 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 05s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29817 and previous config saved to /var/cache/conftool/dbconfig/20220615-125452-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29816 and previous config saved to /var/cache/conftool/dbconfig/20220615-125447-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29815 and previous config saved to /var/cache/conftool/dbconfig/20220615-125442-root.json
* 12:51 jbond@deploy1002: Finished deploy [netbox/deploy@7bbf659]: log (duration: 03m 12s)
* 12:48 jbond@deploy1002: Started deploy [netbox/deploy@7bbf659]: log
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29813 and previous config saved to /var/cache/conftool/dbconfig/20220615-124810-marostegui.json
* 12:42 moritzm: failover ganeti master in eqsin to ganeti5001
* 12:42 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:42 volans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29812 and previous config saved to /var/cache/conftool/dbconfig/20220615-123949-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29811 and previous config saved to /var/cache/conftool/dbconfig/20220615-123943-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29810 and previous config saved to /var/cache/conftool/dbconfig/20220615-123938-root.json
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 12:25 kart_: Updated cxserver to 2022-06-15-074244-production ([[phab:T309266|T309266]], [[phab:T310116|T310116]], [[phab:T309384|T309384]], [[phab:T306963|T306963]])
* 12:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 12:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 es1033 es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29808 and previous config saved to /var/cache/conftool/dbconfig/20220615-122123-root.json
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 12:19 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29807 and previous config saved to /var/cache/conftool/dbconfig/20220615-121620-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29806 and previous config saved to /var/cache/conftool/dbconfig/20220615-121440-marostegui.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 12:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29805 and previous config saved to /var/cache/conftool/dbconfig/20220615-115935-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29804 and previous config saved to /var/cache/conftool/dbconfig/20220615-115452-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29803 and previous config saved to /var/cache/conftool/dbconfig/20220615-115135-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29802 and previous config saved to /var/cache/conftool/dbconfig/20220615-115127-root.json
* 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29801 and previous config saved to /var/cache/conftool/dbconfig/20220615-114950-marostegui.json
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29800 and previous config saved to /var/cache/conftool/dbconfig/20220615-114430-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29799 and previous config saved to /var/cache/conftool/dbconfig/20220615-113948-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29798 and previous config saved to /var/cache/conftool/dbconfig/20220615-113631-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29797 and previous config saved to /var/cache/conftool/dbconfig/20220615-113623-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29796 and previous config saved to /var/cache/conftool/dbconfig/20220615-113445-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29795 and previous config saved to /var/cache/conftool/dbconfig/20220615-112924-marostegui.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29794 and previous config saved to /var/cache/conftool/dbconfig/20220615-112444-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29793 and previous config saved to /var/cache/conftool/dbconfig/20220615-112127-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29792 and previous config saved to /var/cache/conftool/dbconfig/20220615-112119-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29791 and previous config saved to /var/cache/conftool/dbconfig/20220615-111940-marostegui.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29790 and previous config saved to /var/cache/conftool/dbconfig/20220615-110940-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29789 and previous config saved to /var/cache/conftool/dbconfig/20220615-110623-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29788 and previous config saved to /var/cache/conftool/dbconfig/20220615-110616-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29787 and previous config saved to /var/cache/conftool/dbconfig/20220615-110435-marostegui.json
* 10:55 marostegui: dbmaint es3@eqiad [[phab:T310485|T310485]]
* 10:55 marostegui: dbmaint es2@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui: dbmaint es1@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29786 and previous config saved to /var/cache/conftool/dbconfig/20220615-105437-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29784 and previous config saved to /var/cache/conftool/dbconfig/20220615-105119-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29783 and previous config saved to /var/cache/conftool/dbconfig/20220615-105112-root.json
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29782 and previous config saved to /var/cache/conftool/dbconfig/20220615-103933-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29781 and previous config saved to /var/cache/conftool/dbconfig/20220615-103615-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29780 and previous config saved to /var/cache/conftool/dbconfig/20220615-103608-root.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29779 and previous config saved to /var/cache/conftool/dbconfig/20220615-103101-marostegui.json
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29778 and previous config saved to /var/cache/conftool/dbconfig/20220615-103048-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1028 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29777 and previous config saved to /var/cache/conftool/dbconfig/20220615-102929-root.json
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29776 and previous config saved to /var/cache/conftool/dbconfig/20220615-101543-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29775 and previous config saved to /var/cache/conftool/dbconfig/20220615-100235-marostegui.json
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29774 and previous config saved to /var/cache/conftool/dbconfig/20220615-100037-marostegui.json
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29773 and previous config saved to /var/cache/conftool/dbconfig/20220615-094532-marostegui.json
* 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29772 and previous config saved to /var/cache/conftool/dbconfig/20220615-092706-marostegui.json
* 09:20 marostegui: Reboot sanitarium hosts (db1154, db1155) wiki replicas will have lag
* 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1059.eqiad.wmnet with OS bullseye
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29771 and previous config saved to /var/cache/conftool/dbconfig/20220615-091257-marostegui.json
* 09:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29770 and previous config saved to /var/cache/conftool/dbconfig/20220615-091249-marostegui.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29769 and previous config saved to /var/cache/conftool/dbconfig/20220615-091201-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29768 and previous config saved to /var/cache/conftool/dbconfig/20220615-085744-marostegui.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29767 and previous config saved to /var/cache/conftool/dbconfig/20220615-085656-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29766 and previous config saved to /var/cache/conftool/dbconfig/20220615-084239-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29765 and previous config saved to /var/cache/conftool/dbconfig/20220615-084151-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29764 and previous config saved to /var/cache/conftool/dbconfig/20220615-084046-marostegui.json
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29763 and previous config saved to /var/cache/conftool/dbconfig/20220615-083554-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29762 and previous config saved to /var/cache/conftool/dbconfig/20220615-082734-marostegui.json
* 08:23 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:22 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29761 and previous config saved to /var/cache/conftool/dbconfig/20220615-082050-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29760 and previous config saved to /var/cache/conftool/dbconfig/20220615-081744-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29759 and previous config saved to /var/cache/conftool/dbconfig/20220615-080546-root.json
* 08:03 XioNoX: re-enable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29758 and previous config saved to /var/cache/conftool/dbconfig/20220615-080240-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29757 and previous config saved to /var/cache/conftool/dbconfig/20220615-075042-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29756 and previous config saved to /var/cache/conftool/dbconfig/20220615-075024-marostegui.json
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29755 and previous config saved to /var/cache/conftool/dbconfig/20220615-074736-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29754 and previous config saved to /var/cache/conftool/dbconfig/20220615-073538-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29753 and previous config saved to /var/cache/conftool/dbconfig/20220615-073232-root.json
* 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29752 and previous config saved to /var/cache/conftool/dbconfig/20220615-072352-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P29751 and previous config saved to /var/cache/conftool/dbconfig/20220615-072034-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29750 and previous config saved to /var/cache/conftool/dbconfig/20220615-071728-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29749 and previous config saved to /var/cache/conftool/dbconfig/20220615-070847-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29748 and previous config saved to /var/cache/conftool/dbconfig/20220615-065342-marostegui.json
* 06:52 XioNoX: disable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29747 and previous config saved to /var/cache/conftool/dbconfig/20220615-063837-marostegui.json
* 06:02 marostegui: Reboot db[2071-2078] [[phab:T310485|T310485]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29746 and previous config saved to /var/cache/conftool/dbconfig/20220615-060153-marostegui.json
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29745 and previous config saved to /var/cache/conftool/dbconfig/20220615-054252-marostegui.json
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1173.eqiad.wmnet with OS bullseye
* 05:17 marostegui: dbmaint es5@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es4@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es3@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es2@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es1@codfw [[phab:T310485|T310485]]
* 05:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:03 marostegui: Reboot dbproxy1016 and dbproxy1021 [[phab:T310484|T310484]]
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 tstarling@deploy1002: Synchronized php-1.39.0-wmf.16/includes/cache/MessageCache.php: (no justification provided) (duration: 03m 36s)
* 02:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:17 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/includes/cache/MessageCache.php: [[phab:T310532|T310532]] (duration: 03m 29s)
* 02:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2015-12-20 ==
== 2022-06-14 ==
* 23:24 Reedy: Katie and Jeff paged about bellatrix
* 23:52 mutante: gitlab-runner1001/1002 - clean revert not possible, icinga alerting about failed buildkitd service, manually deleting systemd unit and trying to clean up [[phab:T308271|T308271]]
* 18:46 andrewbogott: graceful restart of zuul as per https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Restart
* 23:49 mutante: gitlab-runner1002 - systemctl restart docker; run-puppet-agent ; systemctl start buildkitd  - fails though [[phab:T308271|T308271]]
* 18:31 andrewbogott: restarting stuck Jenkins
* 23:39 mutante: gitlab-runner1001 - systemctl start buildkitd
* 17:47 logmsgbot: reedy@tin Purged l10n cache for 1.27.0-wmf.6
* 23:32 mutante: gitlab-runner1001 - restarting docker
* 17:11 godog: depool mw1228, reported ro fs
* 23:08 mutante: disabling puppet in gitlab-runners (via cumin /disable-puppet) before deploying gerrit:791655 to provide gitlab-runners with buildkit and new docker network - [[phab:T308271|T308271]]
* 15:53 logmsgbot: reedy@tin Synchronized README: noop (duration: 00m 32s)
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:50 Reedy: reedy@tin Purged l10n cache for 1.27.0-wmf.6 (hanging due to mw1228 issue)
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:42 Reedy: mw1228 reporting readonly fs
* 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:41 logmsgbot: reedy@tin Purged l10n cache for 1.27.0-wmf.7
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:00 godog: powercycle ms-be2019, xfs lockup
* 22:15 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|e3fe6c04c95717f0f914bbfa366f5f827f392b6b}}: phpcs: fix more SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 39s)
* 02:28 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Dec 20 02:28:49 UTC 2015 (duration 6m 54s)
* 22:05 urbanecm@deploy1002: Synchronized w/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 18s)
* 02:21 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 59s)
* 22:02 urbanecm@deploy1002: Synchronized src/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 32s)
* 22:00 mutante: wtp1026 - manually running '/usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807'
* 21:58 urbanecm@deploy1002: Synchronized rpc/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 31s)
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:54 urbanecm@deploy1002: Synchronized multiversion/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 29s)
* 21:54 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:51 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 38s)
* 21:49 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
* 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
* 21:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
* 21:38 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 21:32 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 21:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 21:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 21:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 21:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 21:10 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 21:03 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 urbanecm@deploy1002: Synchronized docroot/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 41s)
* 20:37 urbanecm@deploy1002: Synchronized w/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 15s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 urbanecm@deploy1002: Synchronized multiversion/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 28s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 20:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 20:31 urbanecm@deploy1002: Synchronized wmf-config/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 38s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1021.eqiad.wmnet with OS buster
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS buster
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1018.eqiad.wmnet with OS buster
* 20:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1017.eqiad.wmnet with OS buster
* 19:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 19:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 19:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 19:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 19:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 19:32 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 19:32 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 19:16 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
* 19:10 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
* 18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 18:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:15 ayounsi@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=imagescaler-ro,name=codfw
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
* 17:57 brennen@deploy1002: Pruned MediaWiki: 1.39.0-wmf.14 (duration: 01m 53s)
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.16 (duration: 32m 52s)
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:25 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
* 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.16
* 17:13 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): train is blocked - will sync to testwikis and hold there for resolution of [[phab:T310532|T310532]]
* 16:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2053.codfw.wmnet with OS bullseye
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
* 16:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
* 16:12 jnuche@deploy1002: Installation of scap version "4.9.2" completed for 557 hosts
* 16:11 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:05 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:58 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2053.codfw.wmnet with OS bullseye
* 15:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host elastic2053.codfw.wmnet
* 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host elastic2053.codfw.wmnet
* 15:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
* 14:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:53 moritzm: failover ganeti master in ulsfo to ganeti4003
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|596058b5e4d906d40e620fe5b01f37c484f5a8c1}}: Add new throttle rule + remove expired one ([[phab:T310625|T310625]]) (duration: 03m 38s)
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:33 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
* 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
* 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS buster
* 14:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS buster
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS buster
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS buster
* 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS buster
* 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 14:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS buster
* 14:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS buster
* 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS buster
* 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29741 and previous config saved to /var/cache/conftool/dbconfig/20220614-132654-marostegui.json
* 13:13 urbanecm: UTC afternoon B&C window done
* 13:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1692de09bf04c724cf416679405d4b6485550d40}}: Disable DiscussionTools visualenhancements feature in production (duration: 03m 25s)
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29740 and previous config saved to /var/cache/conftool/dbconfig/20220614-131149-marostegui.json
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f2dc7296f0c25d00e45651c50c3e45733cc63b3}}: Make new topic tool available as opt-out almost everywhere (phrase 4; [[phab:T310392|T310392]]) (duration: 03m 45s)
* 13:06 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29739 and previous config saved to /var/cache/conftool/dbconfig/20220614-125644-marostegui.json
* 12:56 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS buster
* 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS buster
* 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS buster
* 12:42 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS buster
* 12:41 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS buster
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29738 and previous config saved to /var/cache/conftool/dbconfig/20220614-124139-marostegui.json
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS buster
* 12:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS buster
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS buster
* 12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS buster
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29737 and previous config saved to /var/cache/conftool/dbconfig/20220614-120323-marostegui.json
* 12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29735 and previous config saved to /var/cache/conftool/dbconfig/20220614-115020-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P29734 and previous config saved to /var/cache/conftool/dbconfig/20220614-113515-marostegui.json
* 11:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1173.eqiad.wmnet with OS bullseye
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29732 and previous config saved to /var/cache/conftool/dbconfig/20220614-110945-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29731 and previous config saved to /var/cache/conftool/dbconfig/20220614-110504-marostegui.json
* 11:02 moritzm: rebalancing ganeti cluster in esams [[phab:T308238|T308238]]
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
* 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29730 and previous config saved to /var/cache/conftool/dbconfig/20220614-105441-root.json
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 10:44 joal@deploy1002: Finished deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency (duration: 00m 09s)
* 10:44 joal@deploy1002: Started deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29729 and previous config saved to /var/cache/conftool/dbconfig/20220614-104021-marostegui.json
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29728 and previous config saved to /var/cache/conftool/dbconfig/20220614-103937-root.json
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:30 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
* 10:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29727 and previous config saved to /var/cache/conftool/dbconfig/20220614-102433-root.json
* 10:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
* 10:19 marostegui: dbmaint s6@eqiad [[phab:T60674|T60674]]
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29726 and previous config saved to /var/cache/conftool/dbconfig/20220614-101755-marostegui.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29725 and previous config saved to /var/cache/conftool/dbconfig/20220614-100930-root.json
* 10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS buster
* 10:03 moritzm: rename Ganeti group row_A in test cluster to row_A-test
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29724 and previous config saved to /var/cache/conftool/dbconfig/20220614-100250-marostegui.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29723 and previous config saved to /var/cache/conftool/dbconfig/20220614-094745-marostegui.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29722 and previous config saved to /var/cache/conftool/dbconfig/20220614-093240-marostegui.json
* 09:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:23 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29721 and previous config saved to /var/cache/conftool/dbconfig/20220614-092330-marostegui.json
* 09:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29720 and previous config saved to /var/cache/conftool/dbconfig/20220614-092322-marostegui.json
* 09:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 09:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 09:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 09:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
* 09:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 09:08 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29719 and previous config saved to /var/cache/conftool/dbconfig/20220614-090817-marostegui.json
* 09:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
* 09:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 09:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 09:01 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
* 09:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
* 08:58 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
* 08:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1001.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
* 08:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS buster
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29718 and previous config saved to /var/cache/conftool/dbconfig/20220614-085312-marostegui.json
* 08:53 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63] (duration: 07m 27s)
* 08:51 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:49 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
* 08:48 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 08:48 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:47 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
* 08:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
* 08:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 08:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63]
* 08:45 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63] (duration: 00m 08s)
* 08:44 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63]
* 08:44 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63] (duration: 04m 45s)
* 08:39 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63]
* 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
* 08:38 godog: reboot centrallog2002 - [[phab:T310483|T310483]]
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29717 and previous config saved to /var/cache/conftool/dbconfig/20220614-083807-marostegui.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29716 and previous config saved to /var/cache/conftool/dbconfig/20220614-082855-marostegui.json
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29715 and previous config saved to /var/cache/conftool/dbconfig/20220614-082847-marostegui.json
* 08:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:20 marostegui: dbmaint s6@eqiad [[phab:T298560|T298560]]
* 08:18 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:16 marostegui: dbmaint s6@eqiad [[phab:T309311|T309311]]
* 08:16 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63] (duration: 31m 09s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29714 and previous config saved to /var/cache/conftool/dbconfig/20220614-081342-marostegui.json
* 08:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS buster
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29713 and previous config saved to /var/cache/conftool/dbconfig/20220614-075837-marostegui.json
* 07:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29712 and previous config saved to /var/cache/conftool/dbconfig/20220614-074331-marostegui.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29711 and previous config saved to /var/cache/conftool/dbconfig/20220614-073322-marostegui.json
* 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 taavi: UTC morning deploys done
* 07:24 marostegui: dbmaint s6@eqiad [[phab:T298563|T298563]]
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804806{{!}}Enable Realtime Preview on cawiki, viwiki, and fawiki (T303961)]] (duration: 03m 20s)
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:802685{{!}}Update $wgVectorMaxWidthOptions to include action=edit (T307725)]] (duration: 03m 36s)
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 marostegui: dbmaint s6@eqiad [[phab:T300381|T300381]]
* 07:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for schema change', diff saved to https://phabricator.wikimedia.org/P29710 and previous config saved to /var/cache/conftool/dbconfig/20220614-065322-root.json
* 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:28 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] (duration: 03m 31s)
* 06:27 marostegui: Reboot dbproxy1012 and dbproxy1015 [[phab:T310484|T310484]]
* 06:24 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/includes/ServiceWiring.php: [[phab:T212129|T212129]] (duration: 03m 33s)
* 06:20 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/extension.json: [[phab:T212129|T212129]] (duration: 03m 32s)
* 06:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29709 and previous config saved to /var/cache/conftool/dbconfig/20220614-060608-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29707 and previous config saved to /var/cache/conftool/dbconfig/20220614-060155-root.json
* 06:01 marostegui: Starting s6 eqiad failover from db1173 to db1131 - [[phab:T300471|T300471]]
* 05:11 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] Switch wgMainStash to db-mainstash (duration: 03m 38s)
* 05:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29706 and previous config saved to /var/cache/conftool/dbconfig/20220614-045224-root.json
* 04:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 04:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29705 and previous config saved to /var/cache/conftool/dbconfig/20220614-024047-ladsgroup.json
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29704 and previous config saved to /var/cache/conftool/dbconfig/20220614-022542-ladsgroup.json
* 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29703 and previous config saved to /var/cache/conftool/dbconfig/20220614-021037-ladsgroup.json
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29702 and previous config saved to /var/cache/conftool/dbconfig/20220614-015532-ladsgroup.json
* 00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29701 and previous config saved to /var/cache/conftool/dbconfig/20220614-003608-marostegui.json
* 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29700 and previous config saved to /var/cache/conftool/dbconfig/20220614-002103-marostegui.json
* 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29699 and previous config saved to /var/cache/conftool/dbconfig/20220614-000558-marostegui.json


== 2015-12-19 ==
== 2022-06-13 ==
* 21:55 _joe_: restarted zotero on sca1001, various OOM messages
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29698 and previous config saved to /var/cache/conftool/dbconfig/20220613-235053-marostegui.json
* 20:48 gwicke: restbase1004: `systemctl mask cassandra` in preparation for the decommission finishing
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:49 akosiaris: killed gmond on db2036. it was clearly misbehaving and running since Jan 02. db2036 was not listed on the ganglia web interface. killing the orphaned process and restarting seems to have fixed it
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 akosiaris: scheduled maintenance of s3 slave lag on db2036, db2043, db2050, db2057 (all of db2018's family that pages) to effectively silence pages while debugging. Check is flapping since 15:00 UTC today
* 23:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:14 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/259611/ - noop for prod, other than making icinga stop complaining (duration: 00m 31s)
* 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:07 hashar: CI jobs for MediaWiki were broken because of cssjanus dependency. Should be fixed once mw/core https://gerrit.wikimedia.org/r/#/c/260169/ lands
* 23:45 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 801836 remove variable wmgDbconfigFromEtcd (duration: 03m 26s)
* 02:28 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Dec 19 02:28:56 UTC 2015 (duration 6m 53s)
* 23:35 tstarling@deploy1002: Synchronized wmf-config/etcd.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 36s)
* 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 53s)
* 23:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:01 gwicke: entire restbase cluster: removed 5% root reserve from data partition with tune2fs -m 0 /dev/mapper/restbase$NODE--vg-{srv,var}
* 23:30 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 30s)
* 00:49 gwicke: restbase1008: removed 5% root reserve from data partition with tune2fs -m 0 /dev/mapper/restbase1008--vg-srv
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29697 and previous config saved to /var/cache/conftool/dbconfig/20220613-232537-marostegui.json
* 23:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29696 and previous config saved to /var/cache/conftool/dbconfig/20220613-232529-marostegui.json
* 23:16 mutante: gitlab-runner2001 - systemctl reset-failed to clear alert about failed ifup for ens14 which is actually up. race condiation caused by reboot
* 23:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29695 and previous config saved to /var/cache/conftool/dbconfig/20220613-231024-marostegui.json
* 22:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29694 and previous config saved to /var/cache/conftool/dbconfig/20220613-225519-marostegui.json
* 22:55 AndyRussG: payments-wiki upgraded from {{Gerrit|8c6208c2}} to {{Gerrit|10304f69}}
* 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29693 and previous config saved to /var/cache/conftool/dbconfig/20220613-224014-marostegui.json
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29692 and previous config saved to /var/cache/conftool/dbconfig/20220613-221522-marostegui.json
* 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 22:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 21:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 21:56 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29691 and previous config saved to /var/cache/conftool/dbconfig/20220613-215118-marostegui.json
* 21:48 mutante: gitlab-runner* - sequentially pausing, rebooting, resuming one by one
* 21:44 mutante: gitlab-runner1001 - pause from accepting jobs - rebooting
* 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29690 and previous config saved to /var/cache/conftool/dbconfig/20220613-213613-marostegui.json
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29689 and previous config saved to /var/cache/conftool/dbconfig/20220613-212108-marostegui.json
* 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29688 and previous config saved to /var/cache/conftool/dbconfig/20220613-210603-marostegui.json
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:29 cjming: end of UTC late backport window
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:805206{{!}}Disable TOC A/B test for beta cluster (T309683)]] (duration: 03m 29s)
* 20:22 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 30s)
* 20:19 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
* 20:18 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ug.svg: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 36s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29687 and previous config saved to /var/cache/conftool/dbconfig/20220613-201420-marostegui.json
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29686 and previous config saved to /var/cache/conftool/dbconfig/20220613-201407-marostegui.json
* 20:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 27s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:08 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-crh.svg: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 16s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29685 and previous config saved to /var/cache/conftool/dbconfig/20220613-195902-marostegui.json
* 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29684 and previous config saved to /var/cache/conftool/dbconfig/20220613-194356-marostegui.json
* 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29683 and previous config saved to /var/cache/conftool/dbconfig/20220613-192851-marostegui.json
* 19:12 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 19:12 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 19:11 mutante: etherpad - minimal downtime - rebooting etherpad1003
* 19:07 mutante: gerrit2002 - rebooting
* 19:04 mutante: gitlab2003 - rebooting
* 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29682 and previous config saved to /var/cache/conftool/dbconfig/20220613-190314-marostegui.json
* 19:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 19:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 19:01 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
* 18:55 mutante: gitlab2002 - rebooting
* 18:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29681 and previous config saved to /var/cache/conftool/dbconfig/20220613-184015-marostegui.json
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29680 and previous config saved to /var/cache/conftool/dbconfig/20220613-182510-marostegui.json
* 18:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29679 and previous config saved to /var/cache/conftool/dbconfig/20220613-181005-marostegui.json
* 17:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1146.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29678 and previous config saved to /var/cache/conftool/dbconfig/20220613-175500-marostegui.json
* 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1145.eqiad.wmnet with OS buster
* 17:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1143.eqiad.wmnet with OS buster
* 17:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1148.eqiad.wmnet with OS buster
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 17:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2002.codfw.wmnet with OS buster
* 17:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
* 17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1147.eqiad.wmnet with OS buster
* 17:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29677 and previous config saved to /var/cache/conftool/dbconfig/20220613-171438-marostegui.json
* 17:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29676 and previous config saved to /var/cache/conftool/dbconfig/20220613-171430-marostegui.json
* 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3001.esams.wmnet with OS bullseye
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1144.eqiad.wmnet with OS buster
* 17:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS buster
* 17:03 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
* 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29675 and previous config saved to /var/cache/conftool/dbconfig/20220613-165925-marostegui.json
* 16:58 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
* 16:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS buster
* 16:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1146.eqiad.wmnet with OS buster
* 16:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 16:53 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1145.eqiad.wmnet with OS buster
* 16:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29674 and previous config saved to /var/cache/conftool/dbconfig/20220613-164419-marostegui.json
* 16:40 dancy@deploy1002: prep aborted:  (duration: 01m 40s)
* 16:38 dancy@deploy1002: prep aborted:  (duration: 06m 12s)
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
* 16:32 marostegui: dbmaint x2@eqiad upgrade and reboot all x2 db hosts [[phab:T310485|T310485]]
* 16:32 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti3001.esams.wmnet with OS bullseye
* 16:32 dancy@deploy1002: prep aborted:  (duration: 00m 26s)
* 16:31 marostegui: Reboot all codfw parsercache hosts [[phab:T310485|T310485]]
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29673 and previous config saved to /var/cache/conftool/dbconfig/20220613-162914-marostegui.json
* 16:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS buster
* 16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2001.codfw.wmnet with OS buster
* 16:10 robh: ganeti3001 rebooting and reimaging for firmware updates via [[phab:T308238|T308238]]
* 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 27s)
* 15:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:47 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 35s)
* 15:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS buster
* 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29672 and previous config saved to /var/cache/conftool/dbconfig/20220613-152900-marostegui.json
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29671 and previous config saved to /var/cache/conftool/dbconfig/20220613-152852-marostegui.json
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29670 and previous config saved to /var/cache/conftool/dbconfig/20220613-151347-marostegui.json
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 15:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
* 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29669 and previous config saved to /var/cache/conftool/dbconfig/20220613-145842-marostegui.json
* 14:58 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29668 and previous config saved to /var/cache/conftool/dbconfig/20220613-144337-marostegui.json
* 14:42 marostegui: Failover m1 and m2 to a different proxy [[phab:T310484|T310484]]
* 14:38 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:34 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29667 and previous config saved to /var/cache/conftool/dbconfig/20220613-141802-marostegui.json
* 14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29666 and previous config saved to /var/cache/conftool/dbconfig/20220613-141754-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29665 and previous config saved to /var/cache/conftool/dbconfig/20220613-140249-marostegui.json
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 13:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dumpsdata1007.eqiad.wmnet
* 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1057.eqiad.wmnet with OS bullseye
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29663 and previous config saved to /var/cache/conftool/dbconfig/20220613-134744-marostegui.json
* 13:45 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
* 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29662 and previous config saved to /var/cache/conftool/dbconfig/20220613-133239-marostegui.json
* 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dse-k8s-worker[1001-1004].eqiad.wmnet with reason: reboots
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on dse-k8s-worker[1001-1004].eqiad.wmnet with reason: reboots
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
* 13:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
* 13:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
* 13:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 13:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1057.eqiad.wmnet with reason: host reimage
* 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
* 13:24 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
* 13:22 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/GrowthExperiments/modules/ext.growthExperiments.DataStore/NewcomerTasksStore.js: {{Gerrit|67a5352b0bf9f6aa160cc93a42ca22a02aad883a}}: NewcomerTasksStore: update quality gate config when the task queue is set ([[phab:T309768|T309768]]) (duration: 03m 41s)
* 13:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1057.eqiad.wmnet with reason: host reimage
* 13:21 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
* 13:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
* 13:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
* 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: reboots
* 13:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
* 13:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: reboots
* 13:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29660 and previous config saved to /var/cache/conftool/dbconfig/20220613-130512-marostegui.json
* 13:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 13:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29659 and previous config saved to /var/cache/conftool/dbconfig/20220613-130504-marostegui.json
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
* 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29658 and previous config saved to /var/cache/conftool/dbconfig/20220613-125419-root.json
* 12:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1057.eqiad.wmnet with OS bullseye
* 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
* 12:51 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P29657 and previous config saved to /var/cache/conftool/dbconfig/20220613-124959-marostegui.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29655 and previous config saved to /var/cache/conftool/dbconfig/20220613-123915-root.json
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P29654 and previous config saved to /var/cache/conftool/dbconfig/20220613-123454-marostegui.json
* 12:33 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
* 12:29 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29653 and previous config saved to /var/cache/conftool/dbconfig/20220613-122411-root.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29652 and previous config saved to /var/cache/conftool/dbconfig/20220613-121949-marostegui.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29651 and previous config saved to /var/cache/conftool/dbconfig/20220613-120907-root.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti3001.esams.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 12:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti3001.esams.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29650 and previous config saved to /var/cache/conftool/dbconfig/20220613-115404-root.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29649 and previous config saved to /var/cache/conftool/dbconfig/20220613-115238-marostegui.json
* 11:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29648 and previous config saved to /var/cache/conftool/dbconfig/20220613-113900-root.json
* 11:36 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox-dev2002.codfw.wmnet
* 11:35 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp2002.wikimedia.org
* 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29647 and previous config saved to /var/cache/conftool/dbconfig/20220613-113004-marostegui.json
* 11:28 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
* 11:27 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
* 11:27 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
* 11:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
* 11:25 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
* 11:25 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
* 11:24 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=netbox,name=codfw
* 11:24 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=netbox
* 11:24 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host netbox1002.eqiad.wmnet
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: After ugprading kernel', diff saved to https://phabricator.wikimedia.org/P29646 and previous config saved to /var/cache/conftool/dbconfig/20220613-112356-root.json
* 11:19 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
* 11:19 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
* 11:19 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=netbox,name=eqiad
* 11:19 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=netbox,name=eqiad
* 11:18 marostegui: Reboot x2 hosts for kernel upgrade [[phab:T310485|T310485]]
* 11:18 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=netbox,name=codfw
* 11:18 marostegui: Reboot db1131 for kernel upgrade [[phab:T310485|T310485]]
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29645 and previous config saved to /var/cache/conftool/dbconfig/20220613-111621-root.json
* 11:15 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
* 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P29644 and previous config saved to /var/cache/conftool/dbconfig/20220613-111459-marostegui.json
* 11:14 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
* 11:12 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
* 11:12 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2002.codfw.wmnet
* 11:11 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 11:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 11:08 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* 11:07 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki2002.codfw.wmnet
* 11:04 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 11:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dsharpe out of all services on: 1219 hosts
* 11:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dsharpe out of all services on: 1219 hosts
* 11:00 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P29643 and previous config saved to /var/cache/conftool/dbconfig/20220613-105954-marostegui.json
* 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dsharpe out of all services on: 609 hosts
* 10:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dsharpe out of all services on: 609 hosts
* 10:52 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:52 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:52 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:51 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:51 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:50 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:50 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:50 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29642 and previous config saved to /var/cache/conftool/dbconfig/20220613-104449-marostegui.json
* 10:38 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:38 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:37 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:37 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29641 and previous config saved to /var/cache/conftool/dbconfig/20220613-101537-marostegui.json
* 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 10:13 moritzm: installing 5.10.120 kernel updates on bullseye hosts
* 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 09:12 moritzm: drain ganeti3001 for firmware update/reimage [[phab:T308238|T308238]]
* 09:07 moritzm: installing ntfs-3g security updates
* 07:54 moritzm: failover ganeti master in esams to ganeti3003 [[phab:T308238|T308238]]
* 07:18 joal: Manually rerun webrequest_text laod for hour 2022-06-12T08:00
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29640 and previous config saved to /var/cache/conftool/dbconfig/20220613-064109-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29639 and previous config saved to /var/cache/conftool/dbconfig/20220613-062605-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29638 and previous config saved to /var/cache/conftool/dbconfig/20220613-061101-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29637 and previous config saved to /var/cache/conftool/dbconfig/20220613-055557-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P29636 and previous config saved to /var/cache/conftool/dbconfig/20220613-054623-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P29635 and previous config saved to /var/cache/conftool/dbconfig/20220613-053118-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29634 and previous config saved to /var/cache/conftool/dbconfig/20220613-051613-marostegui.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29633 and previous config saved to /var/cache/conftool/dbconfig/20220613-051407-marostegui.json
* 05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 05:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 04:59 kart_: Updated cxserver to 2022-06-08-124326-production + nodejs > node command update ([[phab:T306995|T306995]], [[phab:T309169|T309169]])
* 04:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:54 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 04:50 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 04:50 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 04:44 dzahn@cumin2002: conftool action : set/pooled=inactive; selector: dc=codfw,name=thumbor2004.codfw.wmnet
* 04:32 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=thumbor2004.codfw.wmnet
* 04:29 mutante: thumbor2004 - attempted powercycle via DRAC console
* 04:25 mutante: thumbor2006 - host down - attempting powercycle via DRAC console
* 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29629 and previous config saved to /var/cache/conftool/dbconfig/20220613-021511-ladsgroup.json
* 02:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 02:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance


== 2015-12-18 ==
== 2022-06-12 ==
* 22:57 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.9/resources/src/mediawiki/mediawiki.searchSuggest.js: allow override of suggestion type reported in event loggin (duration: 00m 29s)
* 18:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1002.wikimedia.org with OS bullseye
* 22:56 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.9/extensions/CirrusSearch/resources/ext.cirrus.suggest.js: override suggestion type reported in event logging (duration: 00m 30s)
* 18:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1002.wikimedia.org with reason: host reimage
* 22:50 logmsgbot: aaron@tin Synchronized php-1.27.0-wmf.9/includes/jobqueue/aggregator/JobQueueAggregatorRedis.php: 2c942ba1782c42ee68622278a5e0a77e9027945d (duration: 00m 30s)
* 18:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1002.wikimedia.org with reason: host reimage
* 22:30 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.9/extensions/CirrusSearch/resources/ext.cirrus.suggest.js: override suggestion type reported in event logging (duration: 00m 30s)
* 18:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 22:20 logmsgbot: aaron@tin Synchronized php-1.27.0-wmf.9/includes/jobqueue/aggregator/JobQueueAggregator.php: 2c942ba1782c42ee68622278a5e0a77e9027945d (duration: 00m 31s)
* 14:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1002.wikimedia.org with OS bullseye
* 19:26 logmsgbot: aaron@tin Synchronized wmf-config/jobqueue-eqiad.php: Adjust queue "maxPartitionsTry" and timeouts (duration: 00m 30s)
* 14:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1002.wikimedia.org with reason: host reimage
* 18:49 mutante: disregard that, apache config only is enough
* 14:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1002.wikimedia.org with reason: host reimage
* 18:47 mutante: gerrit will restart in a moment and be right back
* 14:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 18:44 ori: ditto
* 13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Revision table maint
* 18:43 Krinkle: Created account "Krinkle" on collabwiki
* 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Revision table maint
* 16:28 twentyafterfour: restarted apache on iridium to deploy redirect script changes
* 06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 16:20 jynus: restarting and reconfiguring mysql on db1047
* 06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:57 godog: stop compactions on restbase1008
* 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29628 and previous config saved to /var/cache/conftool/dbconfig/20220612-064640-ladsgroup.json
* 14:55 jynus: SET GLOBAL query_cache_type = 0; on db1025
* 06:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29627 and previous config saved to /var/cache/conftool/dbconfig/20220612-063135-ladsgroup.json
* 14:54 hashar: gallium: restarted apache2 , was deadlocked/unresponsive somehow
* 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29626 and previous config saved to /var/cache/conftool/dbconfig/20220612-061630-ladsgroup.json
* 14:44 godog: update privatesettings with swift codfw configuration
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29625 and previous config saved to /var/cache/conftool/dbconfig/20220612-060125-ladsgroup.json
* 14:43 godog: set temp-url-key for mw:media account in swift codfw
* 04:29 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1002.wikimedia.org with OS bullseye
* 12:19 paravoid: upgrading tor on radium, rebooting for kernel upgrade
* 03:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 12:18 _joe_: disabled puppet on all lvs hosts for a potentially harmful change (should be a noop)
* 03:26 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddumps1002.wikimedia.org with OS bullseye
* 11:47 _joe_: restarted hhvm on mw1107, stuck at startup
* 03:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 11:40 hashar: logstash: reorganized list of dashboards per sections  https://logstash.wikimedia.org/#/dashboard/elasticsearch/default
* 03:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1002.wikimedia.org with OS bullseye
* 09:43 akosiaris: rebooting planet1001, memory exhaustion, OOM showed up
* 03:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 09:20 hashar: Killed Zuul entirely, the queues were full / deadlocked. Patches need to be retriggered
* 03:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1002.wikimedia.org with OS bullseye
* 06:47 gwicke: restbase1004: nodetool stop -- COMPACTION to avoid running out of disk space
* 03:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 03:07 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.9/includes/api/ApiStashEdit.php: ab32f4e740: Make ApiStashEdit use statsd metrics (duration: 00m 49s)
* 02:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1002.wikimedia.org with OS bullseye
* 02:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Dec 18 02:29:10 UTC 2015 (duration 6m 55s)
* 02:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 45s)
* 02:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 01:52 ori: re-enabled puppet on rdb* / mc*
* 02:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1002.wikimedia.org with OS bullseye
* 01:25 ori: in preparation for Iaefb2d191e, disabling puppet on mc* and rdb*
* 01:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
* 01:21 logmsgbot: krinkle@tin Synchronized docroot and w: (no message) (duration: 00m 32s)
* 01:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
* 00:53 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Flow: Revert Nuke-Flow integration, doesn't work (duration: 00m 32s)
* 01:43 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
* 00:42 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Flow: SWAT: Nuke support for Flow, part 3 (duration: 00m 32s)
* 01:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 00:34 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add completion suggester to BetaFeatures whitelist (duration: 00m 30s)
* 01:22 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddumps1001.wikimedia.org with OS bullseye
* 00:26 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: grumble grumble touch InitialiseSettings grumble (duration: 00m 30s)
* 01:17 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 00:25 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Flow: SWAT: Nuke support for Flow, part 2 (duration: 00m 32s)
* 01:16 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddumps1001.wikimedia.org with OS bullseye
* 00:23 logmsgbot: catrope@tin Synchronized wmf-config/CirrusSearch-production.php: SWAT: enable completion suggester beta on all wikis except wikidata (duration: 00m 30s)
* 00:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 00:23 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: enable completion suggester beta on all wikis except wikidata (duration: 00m 29s)
* 00:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 00:20 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Nuke/: SWAT: Nuke support in Flow, part 1 (duration: 00m 30s)
* 00:18 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/resources/src/mediawiki.messagePoster/mediawiki.messagePoster.factory.js: SWAT: fix error in messagePoster (duration: 00m 29s)
* 00:17 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/MobileFrontend: SWAT: Schema:MobileWebSectionUsage: always log the isTestA field (duration: 00m 31s)
* 00:08 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: SWAT: cleanup (duration: 00m 30s)
* 00:00 ori: restarted mathoid on sca1001


== 2015-12-17 ==
== 2022-06-11 ==
* 23:42 mobrovac: mathoid deploying 8d2295
* 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:47 hashar: ssh to tin is back https://gerrit.wikimedia.org/r/#/c/259876/
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:39 hashar: Only tin lost SSH user keys  apparently due to https://gerrit.wikimedia.org/r/#/c/253465/  overriding the admin::groups to simply "eventlogging-admins"
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:34 hashar: Ssh User keys are gone on deployment servers ( tin / mira )
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:18 eileen1: update CiviCRM from b307d744def9289a7f86cb02bc6e1a00225e474d to cb5e20c29d7376920c45eb5c343e6ee464217833
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:10 eileen1: Updating civicrm from b307d744def9289a7f86cb02bc6e1a00225e474d to cb5e20c29d7376920c45eb5c343e6ee464217833
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:33 hashar: gallium: upgrading Zuul from 2.1.0-60-g1cc37f7-wmf2precise1 .. 2.1.0-60-g1cc37f7-wmf4precise1 .  Should be noop, only change zuul-cloner which is not used there
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:29 mutante: add zuul_2.1.0-60-g1cc37f7-wmf4jessie1 to jessie-wikimedia repo
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:28 mutante: add zuul_2.1.0-60-g1cc37f7-wmf4trusty1 to trusty-wikimedia repo
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:22 mutante: add zuul_2.1.0-60-g1cc37f7-wmf4precise1 to precise-wikimedia APT
* 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:42 mobrovac: mathoid deploying a2187a6
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:07 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.9
* 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:05 godog: disable puppet on graphite2001, brief testing cluster aggregations
* 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Revision table maint
* 19:01 thcipriani: starting update of all wikis to 1.27.0-wmf.9
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Revision table maint
* 18:11 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool x1-slave (db1031), increase db1041 load to 100% (duration: 00m 30s)
* 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1154.eqiad.wmnet with reason: Revision table maint
* 18:06 gwicke: running `nodetool cleanup` on restbase1001 and restbase1005
* 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1154.eqiad.wmnet with reason: Revision table maint
* 18:04 robh: calcium is supposed to be down, reclaiming to spares, ignore any irc alerts (its in maint mode in icinga)
* 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29621 and previous config saved to /var/cache/conftool/dbconfig/20220611-033721-ladsgroup.json
* 17:55 gwicke: running `nodetool cleanup` on restbase1002
* 03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 17:18 jynus: setting mysql db1031 as db2009's master
* 03:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 17:05 jynus: restarting and reconfiguring mysql at db2009
* 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29620 and previous config saved to /var/cache/conftool/dbconfig/20220611-033713-ladsgroup.json
* 16:49 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.8/extensions/Math: SWAT: Make math usable without RESTbase [[gerrit:259734]] (duration: 00m 30s)
* 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29619 and previous config saved to /var/cache/conftool/dbconfig/20220611-032208-ladsgroup.json
* 16:41 jynus: problems with corruption on x1-slave for cebwiki. Fixed them. Will leave db1031 depooled for a while to check they are gone.
* 03:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29618 and previous config saved to /var/cache/conftool/dbconfig/20220611-030703-ladsgroup.json
* 16:39 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.9/extensions/ContentTranslation/includes/Translation.php: SWAT: Fix Undefined index: targetRevisionId in ContentTranslation [[gerrit:259649]] (duration: 00m 29s)
* 02:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29617 and previous config saved to /var/cache/conftool/dbconfig/20220611-025158-ladsgroup.json
* 16:23 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.9/extensions/WikimediaEvents/WikimediaEventsHooks.php: SWAT: Actually define tags for cross-wiki upload A/B test [[gerrit:259729]] (duration: 00m 31s)
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:17 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable cross-wiki upload A/B test in additional languages [[gerrit:259665]] (duration: 00m 30s)
* 01:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:11 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.9/extensions/CirrusSearch/includes/CirrusSearch.php: SWAT: Fix array-to-string conversion [[gerrit:259633]] (duration: 00m 30s)
* 01:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:57 moritzm: stopping opendj LDAP servers on nembus/neptunium (read-only since about days now due to migration to openldap)
* 01:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:56 akosiaris: depool sca1001, playing with cxserver config
* 15:24 jynus: reinstall, reboot and reconfigure mysql at db1031
* 15:03 moritzm: installing git security updates
* 14:23 _joe_: restarting HHVM on the first jobrunners
* 14:11 jynus: soft-rebooting mw1004, responsive to ping, but not to salt, ssh
* 14:06 godog: nodetool stop -- CLEANUP on restbase1002
* 13:59 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1031 (x1-slave) for maintenance (duration: 07m 30s)
* 13:11 moritzm: starting to restart hhvm on application servers (to effect security updates for libxml2, openssl and others)
* 13:01 akosiaris: moved old cruft /srv/deployment/cxserver/deploy/src/config.js out of the way
* 12:51 akosiaris: repooled sca1002 for cxserver
* 12:51 akosiaris: restarted cxserver on sca1002
* 12:33 akosiaris: depooling sca1002 for cxserver
* 12:33 akosiaris: repooling sca1001 for cxserver
* 12:24 kart_: Updated cxserver on sca1002
* 12:08 akosiaris: disable puppet, stop salt-minion on sca1002
* 12:08 akosiaris: depool sca1001 from cxserver service.
* 11:56 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1041 with low weight (duration: 00m 37s)
* 11:33 jynus: performing schema change on officewiki
* 11:31 jynus: performing schema change on x1-master flowdb
* 11:20 jynus: performing schema change on x1-master wikishared.cx_translations
* 10:47 jynus: performing schema change on s7-master metawiki.oauth_registered_consumer
* 10:30 godog: nodetool stop -- CLEANUP restbase1004
* 02:53 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Dec 17 02:53:11 UTC 2015 (duration 7m 12s)
* 02:46 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 22s)
* 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.8) (duration: 10m 32s)
* 00:28 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.8/includes/api/ApiStashEdit.php: I552cf6b0420: Upgrade some ApiStashEdit logging calls to info() (duration: 00m 30s)
* 00:05 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Password policy for sysadmin group (duration: 00m 29s)


== 2015-12-16 ==
== 2022-06-10 ==
* 23:37 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.8/includes/api/ApiStashEdit.php: local hack some extra debug logging into ApiStashEdit (take 2) (duration: 00m 30s)
* 22:04 mutante: mirror1001 - monitored nginx - package was in state "rc" and apache is running instead. systemctl reset-failed cleared alerts
* 23:32 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.8/includes/api/ApiStashEdit.php: local hack some extra debug logging into ApiStashEdit (duration: 00m 30s)
* 22:03 mutante: mirror1001 - nginx service failed since > 1 month and unhandled alert - site is up though
* 23:27 logmsgbot: yurik@tin Synchronized php-1.27.0-wmf.9/extensions/Graph/: Deployed Graph ext to master - protocol issue (duration: 00m 32s)
* 22:00 mutante: miscweb1002 - systemctl start logrotate