You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(es1.6 upgrade: upgrade elastic1023 (manybubbles))
imported>Stashbot
(dancy@deploy1002: backport aborted: (duration: 00m 12s))
Line 1: Line 1:
== 2015-07-16 ==
== 2022-06-24 ==
* 01:22 manybubbles: es1.6 upgrade: upgrade elastic1023
* 19:35 dancy@deploy1002: backport aborted:  (duration: 00m 12s)
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:31 sukhe: finished running homer * commit "adding sukhe" CR: {{Gerrit|8071451}}
* 15:18 dancy@deploy1002: Finished deploy [integration/docroot@ea9b8fa]: (no justification provided) (duration: 00m 08s)
* 15:17 dancy@deploy1002: Started deploy [integration/docroot@ea9b8fa]: (no justification provided)
* 15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 04s)
* 14:53 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 02m 37s)
* 14:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:39 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:39 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30242 and previous config saved to /var/cache/conftool/dbconfig/20220624-143544-root.json
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30241 and previous config saved to /var/cache/conftool/dbconfig/20220624-143537-root.json
* 14:31 sukhe: running homer * commit "adding sukhe" CR: 807145
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30240 and previous config saved to /var/cache/conftool/dbconfig/20220624-142303-root.json
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30239 and previous config saved to /var/cache/conftool/dbconfig/20220624-142040-root.json
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30238 and previous config saved to /var/cache/conftool/dbconfig/20220624-142033-root.json
* 14:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 14:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:11 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:10 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30237 and previous config saved to /var/cache/conftool/dbconfig/20220624-140759-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30236 and previous config saved to /var/cache/conftool/dbconfig/20220624-140536-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30235 and previous config saved to /var/cache/conftool/dbconfig/20220624-140529-root.json
* 14:03 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:03 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 14:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 14:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30234 and previous config saved to /var/cache/conftool/dbconfig/20220624-135940-root.json
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30233 and previous config saved to /var/cache/conftool/dbconfig/20220624-135255-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30232 and previous config saved to /var/cache/conftool/dbconfig/20220624-135032-root.json
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30231 and previous config saved to /var/cache/conftool/dbconfig/20220624-135025-root.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30230 and previous config saved to /var/cache/conftool/dbconfig/20220624-134436-root.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30229 and previous config saved to /var/cache/conftool/dbconfig/20220624-134423-root.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30228 and previous config saved to /var/cache/conftool/dbconfig/20220624-133751-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30227 and previous config saved to /var/cache/conftool/dbconfig/20220624-133528-root.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30226 and previous config saved to /var/cache/conftool/dbconfig/20220624-133521-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30225 and previous config saved to /var/cache/conftool/dbconfig/20220624-132932-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30224 and previous config saved to /var/cache/conftool/dbconfig/20220624-132919-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30223 and previous config saved to /var/cache/conftool/dbconfig/20220624-132247-root.json
* 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS buster
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30222 and previous config saved to /var/cache/conftool/dbconfig/20220624-132024-root.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30221 and previous config saved to /var/cache/conftool/dbconfig/20220624-132017-root.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30220 and previous config saved to /var/cache/conftool/dbconfig/20220624-131428-root.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30219 and previous config saved to /var/cache/conftool/dbconfig/20220624-131415-root.json
* 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:11 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30218 and previous config saved to /var/cache/conftool/dbconfig/20220624-130937-root.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30217 and previous config saved to /var/cache/conftool/dbconfig/20220624-130743-root.json
* 13:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
* 13:05 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:05 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30216 and previous config saved to /var/cache/conftool/dbconfig/20220624-130519-root.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30215 and previous config saved to /var/cache/conftool/dbconfig/20220624-130514-root.json
* 13:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 13:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30214 and previous config saved to /var/cache/conftool/dbconfig/20220624-130055-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30213 and previous config saved to /var/cache/conftool/dbconfig/20220624-125924-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30212 and previous config saved to /var/cache/conftool/dbconfig/20220624-125911-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30211 and previous config saved to /var/cache/conftool/dbconfig/20220624-125834-root.json
* 12:58 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
* 12:58 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30210 and previous config saved to /var/cache/conftool/dbconfig/20220624-125433-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30209 and previous config saved to /var/cache/conftool/dbconfig/20220624-125401-root.json
* 12:54 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 12:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:53 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:46 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
* 12:46 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30208 and previous config saved to /var/cache/conftool/dbconfig/20220624-124420-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30207 and previous config saved to /var/cache/conftool/dbconfig/20220624-124407-root.json
* 12:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 03s)
* 12:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30206 and previous config saved to /var/cache/conftool/dbconfig/20220624-123929-root.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30205 and previous config saved to /var/cache/conftool/dbconfig/20220624-123857-root.json
* 12:34 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 03s)
* 12:34 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30204 and previous config saved to /var/cache/conftool/dbconfig/20220624-122916-root.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30203 and previous config saved to /var/cache/conftool/dbconfig/20220624-122903-root.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30202 and previous config saved to /var/cache/conftool/dbconfig/20220624-122728-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30201 and previous config saved to /var/cache/conftool/dbconfig/20220624-122425-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30200 and previous config saved to /var/cache/conftool/dbconfig/20220624-122353-root.json
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30199 and previous config saved to /var/cache/conftool/dbconfig/20220624-122256-root.json
* 12:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 28s)
* 12:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30198 and previous config saved to /var/cache/conftool/dbconfig/20220624-121359-root.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30197 and previous config saved to /var/cache/conftool/dbconfig/20220624-121224-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30196 and previous config saved to /var/cache/conftool/dbconfig/20220624-120922-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30195 and previous config saved to /var/cache/conftool/dbconfig/20220624-120849-root.json
* 12:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@18182aa]: (no justification provided) (duration: 03m 47s)
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30194 and previous config saved to /var/cache/conftool/dbconfig/20220624-120632-root.json
* 12:04 bmansurov@deploy1002: Started deploy [airflow-dags/research@18182aa]: (no justification provided)
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30193 and previous config saved to /var/cache/conftool/dbconfig/20220624-120411-root.json
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30192 and previous config saved to /var/cache/conftool/dbconfig/20220624-115720-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30191 and previous config saved to /var/cache/conftool/dbconfig/20220624-115418-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30190 and previous config saved to /var/cache/conftool/dbconfig/20220624-115345-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30189 and previous config saved to /var/cache/conftool/dbconfig/20220624-114907-root.json
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30188 and previous config saved to /var/cache/conftool/dbconfig/20220624-114816-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30187 and previous config saved to /var/cache/conftool/dbconfig/20220624-114217-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30186 and previous config saved to /var/cache/conftool/dbconfig/20220624-113914-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30185 and previous config saved to /var/cache/conftool/dbconfig/20220624-113841-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30184 and previous config saved to /var/cache/conftool/dbconfig/20220624-113403-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30183 and previous config saved to /var/cache/conftool/dbconfig/20220624-113312-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30182 and previous config saved to /var/cache/conftool/dbconfig/20220624-113020-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30181 and previous config saved to /var/cache/conftool/dbconfig/20220624-112713-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30180 and previous config saved to /var/cache/conftool/dbconfig/20220624-111859-root.json
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30179 and previous config saved to /var/cache/conftool/dbconfig/20220624-111808-root.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30178 and previous config saved to /var/cache/conftool/dbconfig/20220624-111209-root.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30177 and previous config saved to /var/cache/conftool/dbconfig/20220624-110356-root.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30176 and previous config saved to /var/cache/conftool/dbconfig/20220624-110305-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30175 and previous config saved to /var/cache/conftool/dbconfig/20220624-105705-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30174 and previous config saved to /var/cache/conftool/dbconfig/20220624-104852-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30173 and previous config saved to /var/cache/conftool/dbconfig/20220624-104849-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30172 and previous config saved to /var/cache/conftool/dbconfig/20220624-104801-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30171 and previous config saved to /var/cache/conftool/dbconfig/20220624-104407-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30170 and previous config saved to /var/cache/conftool/dbconfig/20220624-104403-root.json
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30169 and previous config saved to /var/cache/conftool/dbconfig/20220624-103342-root.json
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30168 and previous config saved to /var/cache/conftool/dbconfig/20220624-103257-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30166 and previous config saved to /var/cache/conftool/dbconfig/20220624-102904-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30165 and previous config saved to /var/cache/conftool/dbconfig/20220624-102859-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30164 and previous config saved to /var/cache/conftool/dbconfig/20220624-102856-root.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30163 and previous config saved to /var/cache/conftool/dbconfig/20220624-101753-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30162 and previous config saved to /var/cache/conftool/dbconfig/20220624-101400-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30161 and previous config saved to /var/cache/conftool/dbconfig/20220624-101349-root.json
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30160 and previous config saved to /var/cache/conftool/dbconfig/20220624-100752-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30159 and previous config saved to /var/cache/conftool/dbconfig/20220624-095946-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30158 and previous config saved to /var/cache/conftool/dbconfig/20220624-095935-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30157 and previous config saved to /var/cache/conftool/dbconfig/20220624-095856-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30156 and previous config saved to /var/cache/conftool/dbconfig/20220624-095845-root.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30155 and previous config saved to /var/cache/conftool/dbconfig/20220624-094442-root.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30154 and previous config saved to /var/cache/conftool/dbconfig/20220624-094431-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30153 and previous config saved to /var/cache/conftool/dbconfig/20220624-094352-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30152 and previous config saved to /var/cache/conftool/dbconfig/20220624-094342-root.json
* 09:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:35 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 09:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30151 and previous config saved to /var/cache/conftool/dbconfig/20220624-092938-root.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30150 and previous config saved to /var/cache/conftool/dbconfig/20220624-092927-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30149 and previous config saved to /var/cache/conftool/dbconfig/20220624-092848-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30148 and previous config saved to /var/cache/conftool/dbconfig/20220624-092838-root.json
* 09:25 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:24 moritzm: installing publicsuffix updates from last buster point release
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30147 and previous config saved to /var/cache/conftool/dbconfig/20220624-091434-root.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30146 and previous config saved to /var/cache/conftool/dbconfig/20220624-091423-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30145 and previous config saved to /var/cache/conftool/dbconfig/20220624-091344-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30144 and previous config saved to /var/cache/conftool/dbconfig/20220624-091334-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30143 and previous config saved to /var/cache/conftool/dbconfig/20220624-091227-root.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137,db1138 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30142 and previous config saved to /var/cache/conftool/dbconfig/20220624-090810-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30141 and previous config saved to /var/cache/conftool/dbconfig/20220624-085930-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30140 and previous config saved to /var/cache/conftool/dbconfig/20220624-085919-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30139 and previous config saved to /var/cache/conftool/dbconfig/20220624-085904-root.json
* 08:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts webperf2002.codfw.wmnet
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30137 and previous config saved to /var/cache/conftool/dbconfig/20220624-085723-root.json
* 08:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts webperf2002.codfw.wmnet
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30136 and previous config saved to /var/cache/conftool/dbconfig/20220624-085217-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30135 and previous config saved to /var/cache/conftool/dbconfig/20220624-085210-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30134 and previous config saved to /var/cache/conftool/dbconfig/20220624-085129-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30133 and previous config saved to /var/cache/conftool/dbconfig/20220624-085003-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30132 and previous config saved to /var/cache/conftool/dbconfig/20220624-084426-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30131 and previous config saved to /var/cache/conftool/dbconfig/20220624-084415-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30130 and previous config saved to /var/cache/conftool/dbconfig/20220624-084401-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30129 and previous config saved to /var/cache/conftool/dbconfig/20220624-084219-root.json
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30128 and previous config saved to /var/cache/conftool/dbconfig/20220624-083806-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30127 and previous config saved to /var/cache/conftool/dbconfig/20220624-083713-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30126 and previous config saved to /var/cache/conftool/dbconfig/20220624-083706-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30125 and previous config saved to /var/cache/conftool/dbconfig/20220624-083625-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30124 and previous config saved to /var/cache/conftool/dbconfig/20220624-083459-root.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30123 and previous config saved to /var/cache/conftool/dbconfig/20220624-082857-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30115 and previous config saved to /var/cache/conftool/dbconfig/20220624-080705-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30114 and previous config saved to /var/cache/conftool/dbconfig/20220624-080658-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30113 and previous config saved to /var/cache/conftool/dbconfig/20220624-080618-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30112 and previous config saved to /var/cache/conftool/dbconfig/20220624-080451-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30111 and previous config saved to /var/cache/conftool/dbconfig/20220624-075849-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30110 and previous config saved to /var/cache/conftool/dbconfig/20220624-075707-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30109 and previous config saved to /var/cache/conftool/dbconfig/20220624-075201-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30108 and previous config saved to /var/cache/conftool/dbconfig/20220624-075154-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30107 and previous config saved to /var/cache/conftool/dbconfig/20220624-075114-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30106 and previous config saved to /var/cache/conftool/dbconfig/20220624-075102-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30105 and previous config saved to /var/cache/conftool/dbconfig/20220624-074947-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30104 and previous config saved to /var/cache/conftool/dbconfig/20220624-074345-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30103 and previous config saved to /var/cache/conftool/dbconfig/20220624-074204-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30102 and previous config saved to /var/cache/conftool/dbconfig/20220624-073657-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30101 and previous config saved to /var/cache/conftool/dbconfig/20220624-073651-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30100 and previous config saved to /var/cache/conftool/dbconfig/20220624-073610-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30099 and previous config saved to /var/cache/conftool/dbconfig/20220624-073558-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30098 and previous config saved to /var/cache/conftool/dbconfig/20220624-073543-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30097 and previous config saved to /var/cache/conftool/dbconfig/20220624-073444-root.json
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-cache[2001-2003].codfw.wmnet with reason: reboots
* 07:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-cache[2001-2003].codfw.wmnet with reason: reboots
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30096 and previous config saved to /var/cache/conftool/dbconfig/20220624-072841-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30095 and previous config saved to /var/cache/conftool/dbconfig/20220624-072240-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30094 and previous config saved to /var/cache/conftool/dbconfig/20220624-072153-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30093 and previous config saved to /var/cache/conftool/dbconfig/20220624-072147-root.json
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30092 and previous config saved to /var/cache/conftool/dbconfig/20220624-072106-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30091 and previous config saved to /var/cache/conftool/dbconfig/20220624-072054-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30090 and previous config saved to /var/cache/conftool/dbconfig/20220624-071940-root.json
* 07:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30089 and previous config saved to /var/cache/conftool/dbconfig/20220624-071551-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30088 and previous config saved to /var/cache/conftool/dbconfig/20220624-071439-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es1025 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30087 and previous config saved to /var/cache/conftool/dbconfig/20220624-070700-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30086 and previous config saved to /var/cache/conftool/dbconfig/20220624-070601-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30085 and previous config saved to /var/cache/conftool/dbconfig/20220624-070555-root.json
* 07:02 marostegui: Reboot db1117 for kernel upgrade (expect haproxy irc alerts)
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30084 and previous config saved to /var/cache/conftool/dbconfig/20220624-070201-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30083 and previous config saved to /var/cache/conftool/dbconfig/20220624-070157-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30082 and previous config saved to /var/cache/conftool/dbconfig/20220624-070151-root.json
* 06:53 jynus: restarting bacula director @ backup1001
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30081 and previous config saved to /var/cache/conftool/dbconfig/20220624-065057-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30080 and previous config saved to /var/cache/conftool/dbconfig/20220624-065051-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30079 and previous config saved to /var/cache/conftool/dbconfig/20220624-064657-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30078 and previous config saved to /var/cache/conftool/dbconfig/20220624-064653-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30077 and previous config saved to /var/cache/conftool/dbconfig/20220624-064647-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30076 and previous config saved to /var/cache/conftool/dbconfig/20220624-063553-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30075 and previous config saved to /var/cache/conftool/dbconfig/20220624-063547-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30074 and previous config saved to /var/cache/conftool/dbconfig/20220624-063154-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30073 and previous config saved to /var/cache/conftool/dbconfig/20220624-063149-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30072 and previous config saved to /var/cache/conftool/dbconfig/20220624-063143-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30071 and previous config saved to /var/cache/conftool/dbconfig/20220624-062049-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30070 and previous config saved to /var/cache/conftool/dbconfig/20220624-062043-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30069 and previous config saved to /var/cache/conftool/dbconfig/20220624-061650-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30068 and previous config saved to /var/cache/conftool/dbconfig/20220624-061645-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30067 and previous config saved to /var/cache/conftool/dbconfig/20220624-061640-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30066 and previous config saved to /var/cache/conftool/dbconfig/20220624-060545-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30065 and previous config saved to /var/cache/conftool/dbconfig/20220624-060539-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30064 and previous config saved to /var/cache/conftool/dbconfig/20220624-060146-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30063 and previous config saved to /var/cache/conftool/dbconfig/20220624-060141-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30062 and previous config saved to /var/cache/conftool/dbconfig/20220624-060136-root.json
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30061 and previous config saved to /var/cache/conftool/dbconfig/20220624-055643-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30060 and previous config saved to /var/cache/conftool/dbconfig/20220624-055042-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30059 and previous config saved to /var/cache/conftool/dbconfig/20220624-055035-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30058 and previous config saved to /var/cache/conftool/dbconfig/20220624-054642-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30057 and previous config saved to /var/cache/conftool/dbconfig/20220624-054637-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30056 and previous config saved to /var/cache/conftool/dbconfig/20220624-054632-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170 after kernel reboots', diff saved to https://phabricator.wikimedia.org/P30055 and previous config saved to /var/cache/conftool/dbconfig/20220624-054259-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30054 and previous config saved to /var/cache/conftool/dbconfig/20220624-054139-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30053 and previous config saved to /var/cache/conftool/dbconfig/20220624-053652-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30052 and previous config saved to /var/cache/conftool/dbconfig/20220624-053538-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30051 and previous config saved to /var/cache/conftool/dbconfig/20220624-053531-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30050 and previous config saved to /var/cache/conftool/dbconfig/20220624-053138-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30049 and previous config saved to /var/cache/conftool/dbconfig/20220624-053134-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30048 and previous config saved to /var/cache/conftool/dbconfig/20220624-053128-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 db1169 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30047 and previous config saved to /var/cache/conftool/dbconfig/20220624-052758-root.json
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 db1174 db1175 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30046 and previous config saved to /var/cache/conftool/dbconfig/20220624-052137-root.json


== 2015-07-15 ==
== 2022-06-23 ==
* 23:36 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221885/ (duration: 00m 13s)
* 21:23 mutante: restbase-dev1006 has manually installed packages (wrk, maybe others)
* 23:22 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209840/ (duration: 00m 12s)
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/194075/ (duration: 00m 12s)
* 21:22 brennen: end of utc late backport & config window
* 23:10 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224799/ (duration: 00m 13s)
* 21:21 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808055{{!}}[cleanup] Drop non-existent feature flags]] (duration: 03m 33s)
* 23:09 logmsgbot: krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 13s)
* 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 12s)
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:23 csteipp: deploy patch for T105305 to wmf13/14
* 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223843/ (duration: 00m 12s)
* 21:13 thcipriani@deploy1002: Finished scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]] (duration: 17m 29s)
* 21:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222584/ (duration: 00m 13s)
* 21:01 inflatador: looking in to wdqs1006 alert ^^
* 21:54 manybubbles: es1.6 upgrade: upgrade elastic1022
* 20:56 thcipriani@deploy1002: Started scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]]
* 21:37 manybubbles: es1.6 upgrade: upgrade elastic1021
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:09 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Really Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef this time (duration: 01m 32s)
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 bblack: restarted salt-master service on palladium
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 bblack: globally cleaning up dangling symlinks left in /etc/certs from before Id7d2447 via salted 'find /etc/ssl/certs -type l -xtype l|xargs rm'
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef (revert Count API module instantiations and Hook runs) (duration: 01m 48s)
* 20:49 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808064{{!}}Enable DiscussionTools topicsubscription, autotopicsub on testwiki (T310808)]] (duration: 03m 18s)
* 20:20 manybubbles: es1.6 upgrade: upgrade elastic1020
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dse-k8s-ctrl1001.eqiad.wmnet
* 20:18 RoanKattouw: Running FlowCreateMentionTemplate.php on all Flow wikis
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:06 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf14
* 20:48 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 19:50 ejegg: updated civicrm from e29cc5f20b5069afcaff794e628596c1f70d69a3 to 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224408/ (duration: 00m 12s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 13s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:00 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 12s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 20:43 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806847{{!}}ukwikibooks: Add NS102 (Рецепт) to wgContentNamespaces (T310940)]] (duration: 03m 41s)
* 18:40 ejegg: updated civicrm from f4219bc8eca5e4db633da07b6ac9e2505cfbae16 to e29cc5f20b5069afcaff794e628596c1f70d69a3
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 18:39 logmsgbot: krenair Synchronized wmf-config/throttle.php: throttle labswiki account creations from hackathon at 500 (duration: 00m 12s)
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 18:39 logmsgbot: twentyafterfour Finished scap: group0 to 1.26wmf14 (duration: 32m 34s)
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 18:21 manybubbles: es1.6 upgrade: upgrading elastic1019
* 20:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:20 Jeff_Green: authdns-update shifting to service-oriented hostnames for fundraising cluster
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:06 logmsgbot: twentyafterfour Started scap: group0 to 1.26wmf14
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:55 ejegg: updated civicrm from 6560cefa8d7e68e35e30b310d6691ab57798a4c9 to f4219bc8eca5e4db633da07b6ac9e2505cfbae16
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:34 Jeff_Green: authdns-update to remove boron.wm.o
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:22 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: partially revert https://gerrit.wikimedia.org/r/#/c/224420/1/wmf-config/CommonSettings.php - doesnt quite work (duration: 00m 13s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:17 Jeff_Green: authdns-update to remove aluminium, also lanthanum by preexisting commit
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:45 andrewbogott: rebooting labvirt1005
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:43 mutante: accepting unaccepted salt keys for ganeti VMs ,planet, bromine, krypton
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:39 mutante: krypton - signing puppet cert, initial run
* 20:30 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 16:26 andrewbogott: woo, first try!
* 20:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
* 16:23 andrewbogott: trying to kill labvirt1005 via repeated instance suspend/resume
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:04 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:03 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224808/ (duration: 00m 12s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222581/ (duration: 00m 11s)
* 20:15 mutante: cumin -b 15 -p 95 'mw1*' 'run-puppet-agent -q --failed-only'
* 15:35 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 11s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:29 logmsgbot: krenair Synchronized docroot/noc/createTxtFileSymlinks.sh: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:20 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 11s)
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:33 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 20:11 mutante: cumin -b 15 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 14:22 legoktm: sync failed on mw1090.eqiad.wmnet, read only filesystem
* 20:09 mutante: cumin -b 15 -p 95 'parse*' 'run-puppet-agent -q --failed-only'
* 14:20 logmsgbot: legoktm Synchronized php-1.26wmf13/extensions/CentralAuth/includes/CentralAuthPlugin.php: Add log entry for $wgCentralAuthStrict failures if SULMigration is enabled (duration: 00m 13s)
* 20:07 mutante: cumin -b 15 -p 95 'wtp*' 'run-puppet-agent -q --failed-only'
* 13:55 dcausse: es1.6 upgrade: upgrade elastic1018
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 springle: entry below not mw1216 fault, but r/o filesystem error on mw1090
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:15 springle: sync-common on mw1216 after sync-file from tin failed non-zero exit status 12
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1022 T105879 (duration: 00m 12s)
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:43 dcausse: es1.6 upgrade: upgrade elastic1017
* 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:27 dcausse: es1.6 upgrade: upgrade elastic1016
* 19:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 06:31 dcausse: es1.6 upgrade: upgrade elastic1015
* 19:39 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 05:40 dcausse: es1.6 upgrade: upgrade elastic1014
* 19:34 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 05:10 springle: db1030 busy removing table partitioning
* 19:24 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 04:28 manybubbles: es1.6 upgrade: lowered the shard transfer settings back to our normal rate. going to bed.
* 19:21 ejegg: fundraising python tools updated from {{Gerrit|40d376d4}} to {{Gerrit|acf89fb2}}
* 04:12 manybubbles: es1.6 upgrade: upgrade elastic1013
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 03:49 springle: upgrade db1030 trusty
* 18:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 03:29 manybubbles: es1.6 upgrade: upgrade elastic1012
* 18:38 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 03:14 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-15 03:14:21+00:00
* 18:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 03:10 logmsgbot: reedy Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 13m 32s)
* 18:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 03:03 manybubbles: es1.6 upgrade: raised limits on shard migration rate - should speed up the restart. we should lower it before we do restarts during europe's morning
* 18:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 02:10 Reedy: Running LU manually to see what's wrong with it
* 18:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 15 02:07:48 UTC 2015 (duration 7m 47s)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-15 02:02:55+00:00
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 18:01 brennen: train 1.39.0-wmf.17 ([[phab:T308070|T308070]]): no current blockers - rolling to all wikis
* 18:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:53 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:32 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:00 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:17 hashar: Upgrading CI Jenkins # [[phab:T311174|T311174]]
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:807902{{!}}Do not re-use "wikibase_config" for registering the language selector... (T307869)]] (duration: 03m 22s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30042 and previous config saved to /var/cache/conftool/dbconfig/20220623-150954-root.json
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30041 and previous config saved to /var/cache/conftool/dbconfig/20220623-150951-root.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30040 and previous config saved to /var/cache/conftool/dbconfig/20220623-150422-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30039 and previous config saved to /var/cache/conftool/dbconfig/20220623-145450-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30038 and previous config saved to /var/cache/conftool/dbconfig/20220623-145448-root.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30037 and previous config saved to /var/cache/conftool/dbconfig/20220623-144918-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30036 and previous config saved to /var/cache/conftool/dbconfig/20220623-143946-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30035 and previous config saved to /var/cache/conftool/dbconfig/20220623-143944-root.json
* 14:34 papaul: on going PDU maintenance in rack A3 codfw
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30034 and previous config saved to /var/cache/conftool/dbconfig/20220623-143414-root.json
* 14:31 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:30 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30033 and previous config saved to /var/cache/conftool/dbconfig/20220623-142443-root.json
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30032 and previous config saved to /var/cache/conftool/dbconfig/20220623-142440-root.json
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30031 and previous config saved to /var/cache/conftool/dbconfig/20220623-141910-root.json
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 taavi@deploy1002: Synchronized php-1.39.0-wmf.17/includes/skins/Skin.php: Backport: [[gerrit:807900{{!}}Skin: Change viewport based on feedback (T311119)]] (duration: 03m 29s)
* 14:10 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30030 and previous config saved to /var/cache/conftool/dbconfig/20220623-140939-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30029 and previous config saved to /var/cache/conftool/dbconfig/20220623-140936-root.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30028 and previous config saved to /var/cache/conftool/dbconfig/20220623-140406-root.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:00 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:00 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 13:58 moritzm: import jenkins 2.346.1 to thirdparty/ci [[phab:T311174|T311174]]
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30027 and previous config saved to /var/cache/conftool/dbconfig/20220623-135435-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30026 and previous config saved to /var/cache/conftool/dbconfig/20220623-135432-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30025 and previous config saved to /var/cache/conftool/dbconfig/20220623-134902-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30024 and previous config saved to /var/cache/conftool/dbconfig/20220623-133931-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30023 and previous config saved to /var/cache/conftool/dbconfig/20220623-133928-root.json
* 13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (2/2) (duration: 03m 26s)
* 13:34 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (1/2) (duration: 03m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30022 and previous config saved to /var/cache/conftool/dbconfig/20220623-133358-root.json
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182 db1184 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30021 and previous config saved to /var/cache/conftool/dbconfig/20220623-132951-root.json
* 13:27 sukhe: disable puppet on A:durum or A:wikidough or A:centrallog or A:dns-rec: deploying [[phab:T310574|T310574]]
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30020 and previous config saved to /var/cache/conftool/dbconfig/20220623-132729-root.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30019 and previous config saved to /var/cache/conftool/dbconfig/20220623-132133-root.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30018 and previous config saved to /var/cache/conftool/dbconfig/20220623-132128-root.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807050{{!}}[ImageSuggestions] Enable extension on ptwiki, ruwiki & idwiki (T302711)]] (duration: 03m 44s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30017 and previous config saved to /var/cache/conftool/dbconfig/20220623-130629-root.json
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30016 and previous config saved to /var/cache/conftool/dbconfig/20220623-130624-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30015 and previous config saved to /var/cache/conftool/dbconfig/20220623-125553-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30014 and previous config saved to /var/cache/conftool/dbconfig/20220623-125547-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30013 and previous config saved to /var/cache/conftool/dbconfig/20220623-125125-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30012 and previous config saved to /var/cache/conftool/dbconfig/20220623-125120-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30011 and previous config saved to /var/cache/conftool/dbconfig/20220623-124049-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30010 and previous config saved to /var/cache/conftool/dbconfig/20220623-124043-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30009 and previous config saved to /var/cache/conftool/dbconfig/20220623-123621-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30008 and previous config saved to /var/cache/conftool/dbconfig/20220623-123616-root.json
* 12:26 moritzm: installing waitress security updates
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30007 and previous config saved to /var/cache/conftool/dbconfig/20220623-122545-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30006 and previous config saved to /var/cache/conftool/dbconfig/20220623-122539-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30005 and previous config saved to /var/cache/conftool/dbconfig/20220623-122118-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30004 and previous config saved to /var/cache/conftool/dbconfig/20220623-122112-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30003 and previous config saved to /var/cache/conftool/dbconfig/20220623-121041-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30002 and previous config saved to /var/cache/conftool/dbconfig/20220623-121035-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30001 and previous config saved to /var/cache/conftool/dbconfig/20220623-120614-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30000 and previous config saved to /var/cache/conftool/dbconfig/20220623-120608-root.json
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29999 and previous config saved to /var/cache/conftool/dbconfig/20220623-115537-root.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29998 and previous config saved to /var/cache/conftool/dbconfig/20220623-115532-root.json
* 11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29997 and previous config saved to /var/cache/conftool/dbconfig/20220623-115110-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29996 and previous config saved to /var/cache/conftool/dbconfig/20220623-115104-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128 db1129 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29995 and previous config saved to /var/cache/conftool/dbconfig/20220623-114159-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29994 and previous config saved to /var/cache/conftool/dbconfig/20220623-114033-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29993 and previous config saved to /var/cache/conftool/dbconfig/20220623-114028-root.json
* 11:32 kart_: Updated cxserver to 2022-06-23-052732-production ([[phab:T311196|T311196]])
* 11:31 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:31 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:30 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29992 and previous config saved to /var/cache/conftool/dbconfig/20220623-112529-root.json
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29991 and previous config saved to /var/cache/conftool/dbconfig/20220623-112524-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 es1024 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29990 and previous config saved to /var/cache/conftool/dbconfig/20220623-110804-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29989 and previous config saved to /var/cache/conftool/dbconfig/20220623-105333-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29988 and previous config saved to /var/cache/conftool/dbconfig/20220623-105326-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29987 and previous config saved to /var/cache/conftool/dbconfig/20220623-105320-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29986 and previous config saved to /var/cache/conftool/dbconfig/20220623-103829-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29985 and previous config saved to /var/cache/conftool/dbconfig/20220623-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29984 and previous config saved to /var/cache/conftool/dbconfig/20220623-103816-root.json
* 10:25 jayme: running restart-php7.2-fpm A:parsoid or A:mw or A:mw-api to disable opcache revalidation - [[phab:T266055|T266055]]
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29983 and previous config saved to /var/cache/conftool/dbconfig/20220623-102325-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29982 and previous config saved to /var/cache/conftool/dbconfig/20220623-102318-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29981 and previous config saved to /var/cache/conftool/dbconfig/20220623-102312-root.json
* 10:21 XioNoX: fix eqiad lvs switch port MTU
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29980 and previous config saved to /var/cache/conftool/dbconfig/20220623-100822-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29979 and previous config saved to /var/cache/conftool/dbconfig/20220623-100815-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29978 and previous config saved to /var/cache/conftool/dbconfig/20220623-100808-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29977 and previous config saved to /var/cache/conftool/dbconfig/20220623-095318-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29976 and previous config saved to /var/cache/conftool/dbconfig/20220623-095311-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29975 and previous config saved to /var/cache/conftool/dbconfig/20220623-095304-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29973 and previous config saved to /var/cache/conftool/dbconfig/20220623-093814-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29972 and previous config saved to /var/cache/conftool/dbconfig/20220623-093807-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29971 and previous config saved to /var/cache/conftool/dbconfig/20220623-093800-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29970 and previous config saved to /var/cache/conftool/dbconfig/20220623-092310-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29969 and previous config saved to /var/cache/conftool/dbconfig/20220623-092303-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29968 and previous config saved to /var/cache/conftool/dbconfig/20220623-092256-root.json
* 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 db1179 db1180 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29967 and previous config saved to /var/cache/conftool/dbconfig/20220623-090842-root.json
* 09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:52 joal@deploy1002: Finished deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs (duration: 00m 08s)
* 08:52 joal@deploy1002: Started deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs
* 08:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 7 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 7 hosts with reason: Reboots
* 07:39 moritzm: installing firejail security updates
* 07:36 TheresNoTime: UTC morning deploys done
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806365{{!}}GrowthExperiments: Enable link recommendations frontend, round 4 (T304548)]] (duration: 03m 37s)
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Reboots
* 00:35 brennen: end of phabricator maintenance window
* 00:13 brennen: phabricator deploy finished ([[phab:T311175|T311175]])
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance


== 2015-07-14 ==
== 2022-06-22 ==
* 23:46 manybubbles: es1.6 upgrade: upgraded elastic1011
* 22:56 tzatziki: removing 1 file for legal compliance
* 23:22 bblack: updating nginx to 1.9.3-1+wmf1 on cp*
* 21:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
* 23:17 bblack: reprepro: nginx for jessie-wikimedia/main bumped to 1.9.3-1+wmf1
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 to resolve Old GC Hell alert
* 22:22 ejegg: updated civicrm from 04efc7d5c7bbb068f907125f2184692aee676123 to 6560cefa8d7e68e35e30b310d6691ab57798a4c9
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad to resolve Old GC Hell alert
* 21:29 Reedy: mw1090 fs is ro
* 21:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1006.eqiad.wmnet with OS bullseye
* 21:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Fix testwiki
* 20:49 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44] (duration: 01m 18s)
* 21:05 _joe|AFK: depooling mw1090, ext4 errors in syslog, filesystem mounted read-only
* 20:48 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44]
* 21:01 logmsgbot: twentyafterfour Synchronized wmf-config/CommonSettings.php: revert LCStoreStaticArray (duration: 00m 12s)
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 20:59 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf14 and rebuild localization cache (duration: 72m 45s)
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 20:42 bblack: undoing LCStoreStaticArray because appservers look unhealthy, using ori's command: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"'
* 20:27 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 19:46 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf14 and rebuild localization cache
* 20:24 cjming: end of UTC late backport window
* 19:23 manybubbles: es1.6 step iforget: upgrade elasticsearch on elastic1010
* 20:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 17:41 mutante: terbium:   /usr/local/bin/foreachwiki extensions/Echo/maintenance/processEchoEmailBatch.php
* 20:19 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44] (duration: 07m 36s)
* 17:10 dcausse: es1.6 step 10: upgrade elastic1009
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:23 mutante: bromine - apt-get upgrade
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:08 logmsgbot: manybubbles Synchronized php-1.26wmf13/extensions/UniversalLanguageSelector/: SWAT add some hooks to extension.json (duration: 00m 13s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:34 gwicke: started RESTBase revision thin-out script for html and data-parsoid on wikimedia domains
* 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807593{{!}}gawiki: Change category collation from `uppercase` to `uca-ga-u-kn` (T311136)]] (duration: 03m 39s)
* 14:01 dcausse: es1.6 step 9: upgrade elastic1008
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS bullseye
* 12:48 _joe_: reimaging mw1155
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44]
* 12:17 ori: Logging a message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log.
* 20:11 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44] (duration: 00m 07s)
* 11:28 dcausse: es1.6 step 8: upgrade elastic1007
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44]
* 11:25 _joe_: repooling mw1154 with HHVM
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:12 _joe_: stopped poolcounter on mw1154
* 20:10 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44] (duration: 06m 16s)
* 10:06 _joe_: reimaging mw1154
* 20:03 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44]
* 07:49 dcausse: es1.6 step 7: upgrade elastic1006
* 20:03 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44] (duration: 30m 58s)
* 07:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 07:09:10 UTC 2015 (duration 9m 9s)
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 06:48 dcausse: es1.6 step 6: upgrade elastic1005
* 19:42 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 06:41 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I9c9bf0f4: Use LCStoreStaticArray unconditionally (duration: 03m 02s)
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection (duration: 02m 03s)
* 05:26 ori: Cleaned up now-unused hhbc files from /run/hhvm/cache on job runners
* 19:37 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection
* 04:58 ori: Enabling LCStoreStaticArray in production. May be reverted by running: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"' on palladium.
* 19:32 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44]
* 04:48 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Follow-up for Ieb62ee050e: allow LCStoreStaticArray in server mode (duration: 00m 13s)
* 19:31 aqu: Deploying analytics/refinery (weekly train)
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-14 02:35:21+00:00
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 07m 27s)
* 19:14 herron: bounced apache on lists1001
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 02:07:32 UTC 2015 (duration 7m 30s)
* 19:06 hashar: Restarting CI Jenkins
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-14 02:02:33+00:00
* 16:46 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1009.eqiad.wmnet with OS bullseye
* 01:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 16:45 hashar: Restarting CI Jenkins
* 16:43 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
* 16:33 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:29 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:18 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:14 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 16:13 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 16:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 16:08 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 16:06 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 16:05 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 16:04 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 15:36 moritzm: upload jenkins 2.332.4 to apt.wikimedia.org [[phab:T311068|T311068]]
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 15:00 jayme: published docker-registry.discovery.wmnet/helm-state-metrics:0.1.0-1 - [[phab:T310714|T310714]]
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
* 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 Lucas_WMDE: UTC afternoon backport+config window done
* 14:09 lucaswerkmeister-wmde@deploy1002: Synchronized logos/manage.py: Config: [[gerrit:807486{{!}}logos: Update phpcs comment]] (should be a no-op but syncing just in case) (duration: 03m 19s)
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 14:01 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' specieswiki<nowiki>{</nowiki>,-<nowiki>{</nowiki>1.5,2<nowiki>}</nowiki>x<nowiki>}</nowiki>.png {{!}} mwscript purgeList.php # [[phab:T310961|T310961]]
* 14:01 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (3/3) (duration: 03m 30s)
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (2/3) (duration: 03m 29s)
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:56 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 13:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (1/3) (duration: 03m 46s)
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (2/2) (duration: 03m 39s)
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (1/2) (duration: 03m 35s)
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:28 XioNoX: fix MTU on eqiad server facing switch ports
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:807255{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (2/3) (T304328)]] (duration: 03m 35s)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807254{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (1/3) (T304328)]] (duration: 03m 35s)
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 13:10 XioNoX: fix MTU in drmrs
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:807211{{!}}[wmf-config]: Deploy GDI Survey Wave 2 - BETA (T311079)]] (duration: 03m 29s)
* 12:58 XioNoX: fix MTU on codfw switches access ports
* 12:57 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 12:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 12:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 12:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 12:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 12:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 12:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 12:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 12:18 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 12:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 12:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 12:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 12:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 11:46 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 11:41 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 11:11 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 20s)
* 11:10 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:09 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 11s)
* 11:08 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:07 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 02m 54s)
* 11:05 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 10:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
* 10:53 jayme: systemctl restart rsyslog on kubernetes2008
* 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
* 10:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
* 10:41 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
* 10:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
* 10:36 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
* 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
* 10:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
* 10:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
* 10:17 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
* 10:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
* 10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 10:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
* 10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2003.codfw.wmnet
* 10:04 moritzm: installing vim security updates
* 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 09:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
* 09:35 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:35 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 09:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
* 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 08:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
* 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29964 and previous config saved to /var/cache/conftool/dbconfig/20220622-084234-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29963 and previous config saved to /var/cache/conftool/dbconfig/20220622-084225-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29962 and previous config saved to /var/cache/conftool/dbconfig/20220622-084206-root.json
* 08:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29961 and previous config saved to /var/cache/conftool/dbconfig/20220622-082730-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29960 and previous config saved to /var/cache/conftool/dbconfig/20220622-082721-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29959 and previous config saved to /var/cache/conftool/dbconfig/20220622-082702-root.json
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 08:18 marostegui: Upgrade kernel and reboot on db[1111,1132,1143,1127].eqiad.wmnet
* 08:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 08:15 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]] (duration: 03m 43s)
* 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29957 and previous config saved to /var/cache/conftool/dbconfig/20220622-081227-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29956 and previous config saved to /var/cache/conftool/dbconfig/20220622-081217-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29955 and previous config saved to /var/cache/conftool/dbconfig/20220622-081159-root.json
* 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 08:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
* 08:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
* 08:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
* 08:04 hashar: Updating operations-puppet-tests-buster-docker Jenkins job to use the latest Docker image (rebuild to catch up with latest defined gems). https://gerrit.wikimedia.org/r/c/integration/config/+/807478
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29954 and previous config saved to /var/cache/conftool/dbconfig/20220622-075721-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29953 and previous config saved to /var/cache/conftool/dbconfig/20220622-075713-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29952 and previous config saved to /var/cache/conftool/dbconfig/20220622-075655-root.json
* 07:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 07:53 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
* 07:50 marostegui: Upgrade kernel and reboot on db[2145-2150].codfw.wmnet
* 07:49 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29951 and previous config saved to /var/cache/conftool/dbconfig/20220622-074217-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29950 and previous config saved to /var/cache/conftool/dbconfig/20220622-074209-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29949 and previous config saved to /var/cache/conftool/dbconfig/20220622-074151-root.json
* 07:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 07:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29948 and previous config saved to /var/cache/conftool/dbconfig/20220622-072714-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29947 and previous config saved to /var/cache/conftool/dbconfig/20220622-072705-root.json
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29946 and previous config saved to /var/cache/conftool/dbconfig/20220622-072647-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29945 and previous config saved to /var/cache/conftool/dbconfig/20220622-071210-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29944 and previous config saved to /var/cache/conftool/dbconfig/20220622-071201-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29943 and previous config saved to /var/cache/conftool/dbconfig/20220622-071143-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 es1026 es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29942 and previous config saved to /var/cache/conftool/dbconfig/20220622-065507-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Switchover es1, es2 and es3 masters', diff saved to https://phabricator.wikimedia.org/P29941 and previous config saved to /var/cache/conftool/dbconfig/20220622-065208-marostegui.json
* 05:52 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 01:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:17 tstarling@deploy1002: Synchronized wmf-config/mc-labs.php: for completeness (duration: 03m 41s)
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:13 tstarling@deploy1002: Synchronized wmf-config/mc.php: g 807158 [[phab:T278392|T278392]] (duration: 03m 35s)
* 01:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2015-07-13 ==
== 2022-06-21 ==
* 23:22 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/VisualEditor: SWAT (duration: 00m 11s)
* 20:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b42e57d75ec6b0536493fa073805a0bcb066aef1}}: zhwikiquote: Disable local upload ([[phab:T311017|T311017]]) (duration: 03m 43s)
* 23:11 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Add title to Parsoid exception logging (duration: 00m 12s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:45 logmsgbot: legoktm Synchronized wmf-config: Revert "Set $wgCentralAuthStrict = true;" (duration: 00m 13s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:41 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 13s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:41 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:16 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/User.php: Add 'AuthPluginStrict' log to identify users who are unable to authenticate (duration: 00m 13s)
* 20:22 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (2/2) (duration: 03m 28s)
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 12s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/Hooks.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 13s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:13 ejegg: updated payments from ec34ebf61e5962f66b807abdcb519ff323d41e8e to 4ca95d55a9745c05ccfbb16ee6f23a6f75328824
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:00 manybubbles: es1.6 step 4: upgrade elastic1003
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:54 ori: Debugging metric issue on graphite1001, brief stats drop possible
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (1/2) (duration: 03m 30s)
* 21:32 legoktm: renaming ~3k users who were originally missed for SULF
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/Hooks.php: (no message) (duration: 00m 12s)
* 20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f70e302e11756d9704acc86c45b3d7aabf31c4d}}: fawiktionary: Enable SandboxLink extension ([[phab:T308505|T308505]]) (duration: 03m 37s)
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: (no message) (duration: 00m 13s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:42 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: f9c89d2814: Revert "Revert Count API module instantiations and Hook runs" (duration: 00m 13s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:30 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ieb62ee05: Temporary hack to facilitate migration of l10n cache implementations (duration: 00m 11s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:42 hoo: Updated Wikidata's property suggester with data from today's json dump
* 19:38 dancy@deploy1002: backport aborted: (duration: 00m 10s)
* 19:24 manybubbles_: es1.6 step 3: upgrade elastic1002
* 19:38 dancy@deploy1002: Installation of scap version "4.9.5" completed for 558 hosts
* 19:08 legoktm: running populateContentModel.php --table=page on all small wikis
* 19:38 dancy@deploy1002: Installing scap version "4.9.5" for 558 hosts
* 19:01 andrewbogott: two of two
* 19:22 urandom: replicating Cassandra `system_auth` keyspace to codfw -- [[phab:T307641|T307641]]
* 19:01 mutante: morebots - are you 1.7.11 ?
* 18:56 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2` failed due to syntax error, patch here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/807200
* 19:01 andrewbogott: one of two
* 18:48 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2`
* 18:52 legoktm: running populateContentModel.php --table=page on testwiki
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1001.wikimedia.org
* 18:29 manybubbles_: es1.6 step 2: shut down extra instance of elasticsearch on elastic1021
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:39 andrewbogott: this is the second test log of three
* 17:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 17:39 andrewbogott: this is the first test log of three
* 17:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp1001.wikimedia.org
* 17:36 mutante: included adminbot_1.7.11 in APT repo
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2001.wikimedia.org
* 16:31 andrewbogott: wikidata-dev updated local puppet and rebooting property-suggester
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 17:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host elastic1049.eqiad.wmnet
* 16:07 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 17:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:11 manybubbles_: all done SWATing.
* 17:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp2001.wikimedia.org
* 15:09 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable footer contact link on ukwiki (duration: 00m 11s)
* 17:14 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 14:55 manybubbles_: after upgrading elasticsearch its init script no longer shuts down the old version of elasticsearch. so you have to manually kill it. that means the upgrade instructions will be "special" this time around. hopefully this is a one time thing.
* 17:09 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1049.eqiad.wmnet
* 14:45 manybubbles_: es1.6 step 1: upgrade elasticsearch on elastic1001 -starting
* 17:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 14:45 manybubbles_: es1.6 step 0: successfully synced new versions of plugins
* 17:01 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 14:30 manybubbles_: es1.6 step 0: sync new versions of plugins
* 16:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
* 14:30 manybubbles_: starting the elasticsearch 1.6.0 upgrade
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 13:13 bblack: updating nginx/bind on cp*
* 16:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 13:07 bblack: updating openssl on cp*
* 16:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
* 13:02 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/Cite/extension.json: https://gerrit.wikimedia.org/r/#/c/224407/ - unbreak VE mobile, https://phabricator.wikimedia.org/T105686 (duration: 00m 12s)
* 15:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
* 10:58 mobrovac: restbase deploying 6dec79d
* 15:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 10:22 logmsgbot: ori Synchronized php-1.26wmf13/maintenance/rebuildLocalisationCache.php: 117f60a171: rebuildLocalisationCache: don't limit memory usage (duration: 00m 12s)
* 15:55 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2048.codfw.wmnet
* 08:52 godog: bounce graphite-web on graphite1001
* 15:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
* 08:51 godog: bounce carbon daemons on graphite1001
* 15:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
* 08:50 godog: upgrade graphite to 0.9.13 on graphite1001 and bounce one instance of carbon/cache
* 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 07:29 logmsgbot: ori Synchronized php-1.26wmf13/includes/cache/LCStoreStaticArray.php: I3f63594a4: Fix variable name (follows Ib2c5856d) (duration: 00m 11s)
* 15:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 06:25 logmsgbot: LocalisationUpdate failed: git pull of core failed
* 15:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (2/2) (duration: 03m 28s)
* 06:24 ori: Experimenting with altering the localisation cache implementation for testwiki, operations/mediawiki-config on tin will have a local hack for a little bit
* 15:37 klausman: restarting pybal on lvs2009
* 05:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 05:07:32 UTC 2015 (duration 7m 31s)
* 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 02:25:58 UTC 2015 (duration 25m 57s)
* 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:23:43+00:00
* 15:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (1/2) (duration: 03m 51s)
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 16s)
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:10:25+00:00
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:10 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:47 springle: restarted labsdb1002 mysqld while troubleshooting replication
* 15:30 klausman: Restarting pybal on lvs2010
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2001.codfw.wmnet
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging-ctrl2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:16 klausman@cumin1001: conftool action : help; selector: name=ml-staging2001
* 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:06 moritzm: installing avahi security updates
* 15:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:01 papaul: PDU swap for rack a2 complete
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:24 papaul: on going maintenance on ps1-a2-codfw
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 13:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
* 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
* 13:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
* 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
* 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:28 daniel@deploy1002: Synchronized rpc/: Config: [[gerrit:805775{{!}}rpc: Remove unused RunJobs.php (T175146 T243096)]] (duration: 03m 45s)
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
* 13:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 13:05 moritzm: installing Linux 5.10.120-1~bpo10+1 on buster hosts with backports kernel
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 13:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
* 12:56 moritzm: installing haproxy security updates on stretch
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 12:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 12:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
* 12:43 moritzm: installing python-bottle security updates
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 12:25 moritzm: reset logster-csp/logster-badpass-priv on mwlog1002, these were removed from Puppet
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:59 mbsantos: mbsantos@maps2009 imposm-removebackup-import ([[phab:T305845|T305845]])
* 11:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:43 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for testing', diff saved to https://phabricator.wikimedia.org/P29936 and previous config saved to /var/cache/conftool/dbconfig/20220621-114232-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for testing', diff saved to https://phabricator.wikimedia.org/P29935 and previous config saved to /var/cache/conftool/dbconfig/20220621-114216-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for testing', diff saved to https://phabricator.wikimedia.org/P29934 and previous config saved to /var/cache/conftool/dbconfig/20220621-114151-root.json
* 10:57 volans: deleting netbox getstats.GetDeviceStats job results - [[phab:T311048|T311048]]
* 10:51 kart_: Updated cxserver to 2022-06-21-035954-production ([[phab:T307970|T307970]])
* 10:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 10:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 10:47 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 10:47 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 10:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 10:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:31 urbanecm: 09:29:23 Synchronized wmf-config/throttle.php: {{Gerrit|7c9f6a561b2b4b5c5db063bad83bd23e9cbac347}}: Add a throttle rule for a Czech course ([[phab:T310885|T310885]]) (duration: 03m 34s) #manually logging in logmsgbot's absence
* 09:20 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 09:13 marostegui: dbmaint s8@codfw [[phab:T310011|T310011]]
* 08:29 marostegui: Reboot db1120 for kernel upgrade
* 08:14 moritzm: remove EOLed parsoid debs from releases.wikimedia.org [[phab:T309765|T309765]]
* 05:54 marostegui: Reboot db1132 and db1181 for kernel upgrade


== 2015-07-12 ==
== 2022-06-20 ==
* 14:59 bblack: upgraded most packages on sodium
* 07:14 SandraEbele: Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics).
* 14:48 bblack: upgraded apache2 to 2.2.22-1ubuntu1.9 on: antimony argon caesium fluorine helium iodine logstash1001 logstash1003 magnesium neon netmon1001 rhodium stat1001 ytterbium
* 07:14 SandraEbele: killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs.
* 04:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 04:49:08 UTC 2015 (duration 49m 7s)
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:26:52+00:00
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 02:25:33 UTC 2015 (duration 25m 32s)
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 12s)
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:10:00+00:00
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)


== 2015-07-11 ==
== 2022-06-19 ==
* 19:48 jynus: stopping labsdb1002 after table corruption has been detected
* 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 19:37 urandom: from restbase1002, starting revision culling process (node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | tee >(gzip -c > local_group_wikimedia_T_parsoid_html.log.`date +%s`.gz))
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 19:33 urandom: restbase: setting gc_grace_seconds to 604800 (1 week) on local_group_wikipedia_T_parsoid_html.data
* 10:14 ayounsi@cumin1001: dbctl commit (dc=all): 'depool', diff saved to https://phabricator.wikimedia.org/P29910 and previous config saved to /var/cache/conftool/dbconfig/20220619-101436-ayounsi.json
* 04:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 04:55:56 UTC 2015 (duration 55m 55s)
* 04:21 bd808: Logstash cluster upgrade complete! Kibana working again
* 04:21 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1006
* 04:12 bd808: rebooting logstash1006
* 04:06 bd808: logstash1005 fully recovered all shards
* 03:21 logmsgbot: mattflaschen Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Bump Flow to encode page name when sending to Parsoid (duration: 00m 13s)
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:28:18+00:00
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 07s)
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 02:25:19 UTC 2015 (duration 25m 18s)
* 02:09 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:09:45+00:00
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 35s)
* 00:46 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1005; replicas recovering now
* 00:34 bd808: rebooting logstash1005
* 00:30 bd808: logstash1004 fully recovered all shards


== 2015-07-10 ==
== 2022-06-17 ==
* 22:51 mutante: tendril: very short maintenance downtime
* 22:05 AndyRussG: update payments-wiki revision {{Gerrit|10304f69}} -> {{Gerrit|ef53c82e}}
* 20:10 bd808: `service elasticsearch start` not starting on logstash1004; investigating
* 20:22 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P29908 and previous config saved to /var/cache/conftool/dbconfig/20220617-202240-jynus.json
* 20:07 bd808: ran apt-get upgrade on logstash1004
* 20:20 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P29907 and previous config saved to /var/cache/conftool/dbconfig/20220617-202038-jynus.json
* 19:52 mutante: adminbot - built and imported 1.7.10 into APT repo
* 17:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS buster
* 19:43 bd808: rebooting logstash1004
* 17:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 19:40 bd808: Kibana seems to be broken by mixed 1.6.0/1.3.9 cluster
* 17:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 19:32 bd808: kibana not seeing indices after upgrading elasticsearch to 1.6.0; investigating
* 16:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS buster
* 19:26 bd808: Upgraded logstash1003 to elasticsearch 1.6.0
* 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 19:22 bd808: Upgraded logstash1002 to elasticsearch 1.6.0
* 16:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS buster
* 19:19 bd808: Upgraded logstash1001 to elasticsearch 1.6.0
* 16:37 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 19:10 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableNode.js: https://gerrit.wikimedia.org/r/#/c/224122/ (duration: 00m 12s)
* 16:35 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:11 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 120'
* 16:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 18:00 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 90'
* 16:34 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 17:49 gwicke: rolling restart of the cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/224114/
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 17:32 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: prevent race condition on writing settings (duration: 00m 13s)
* 16:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 17:26 moritzm: installed python security updates on mc*
* 16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 17:25 Coren: rebooting labstore2001 (experiments with the new raid setup caused the mapper table to fill)
* 16:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 16:35 mobrovac: restbase deploying hotfix for T105509
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 15:29 mobrovac: restbase restarted restabse on restbase1004
* 16:22 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 15:25 godog: bounce cassandra on restbae1004
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 13:43 godog: bounce cassandra on restbae1004
* 16:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 13:37 _joe_: temporarily repooled mw1031
* 16:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 12:40 godog: bounce cassandra on restbae1004
* 16:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 07:43 godog: reimage ms-be2013 T105213
* 16:06 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 04:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 10 04:36:49 UTC 2015 (duration 36m 48s)
* 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
* 04:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037; repool db1030 (revert below) (duration: 00m 12s)
* 16:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 04:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 03:14 mutante: re-enabling puppet on tools-exec-1213, working around adminbot package install fail
* 15:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 02:59 elee: please log this with the year
* 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 02:53 andrewbogott: testing the log by logging a test
* 15:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 01:50 gwicke: bounced cassandra on restbase1004
* 15:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 01:38 jgage: cassandra restarted on restbase1004
* 15:46 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS buster
* 00:39 urandom: starting restbase1004
* 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 00:35 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/modules/ve-mw/ui/inspectors/ve.ui.MWLinkAnnotationInspector.js: https://gerrit.wikimedia.org/r/#/c/223983/ (duration: 00m 12s)
* 15:39 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 00:15 hoo: Updated WikibaseQualityConstraints data on wikidata (wikidatawiki.wbqc_constraints)
* 15:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 15:32 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS buster
* 15:29 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 15:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 15:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 15:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 15:18 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:15 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS buster
* 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 15:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 15:02 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 14:59 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 14:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 14:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 12:35 SandraEbele: deployed daily airflow dag for 3 Wikidata metrics.
* 11:54 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@18182aa]: (no justification provided) (duration: 00m 13s)
* 11:54 ebysans@deploy1002: Started deploy [airflow-dags/analytics@18182aa]: (no justification provided)
* 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2012.codfw.wmnet
* 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2012.codfw.wmnet
* 11:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2011.codfw.wmnet
* 11:40 moritzm: upload cas 6.5.5+wmf11u1 to apt.wikimedia.org [[phab:T305518|T305518]]
* 11:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2011.codfw.wmnet
* 11:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2010.codfw.wmnet
* 11:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 11:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 11:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 11:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 11:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2010.codfw.wmnet
* 11:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
* 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
* 11:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1010.eqiad.wmnet
* 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 10:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 10:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 10:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 10:34 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
* 09:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
* 09:56 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 09:56 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 09:55 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 09:55 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 09:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
* 09:44 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
* 09:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
* 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1004.eqiad.wmnet
* 09:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
* 09:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
* 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
* 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
* 09:11 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 09:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 08:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 08:51 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 08:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 08:39 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 08:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
* 08:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
* 08:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
* 08:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
* 07:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 07:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 02:51 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
* 02:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:36 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 03m 43s)
* 02:02 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
* 01:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
* 01:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
* 00:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 00:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye


== July 9 ==
== 2022-06-16 ==
* 23:41 legoktm: deployed patch for T105413
* 23:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 23:07 gwicke: bounced cassandra on restbase1004
* 23:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 23:02 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: TitleBlacklist: Don't block account auto-creation (duration: 00m 13s)
* 23:38 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 22:09 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-eqiad.php: I don't think we want to keep poolcounter running on an imagescaler (duration: 00m 12s)
* 23:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 21:30 logmsgbot: tgr Synchronized php-1.26wmf13/extensions/OAuth/api/MWOAuthAPI.setup.php: no canonical redirects for requests with OAuth headers (duration: 00m 12s)
* 22:59 mutante: new Wikipedia languages added to DNS: blk = https://en.wikipedia.org/wiki/Pa%27O_language  {{!}}  pcm = https://en.wikipedia.org/wiki/Nigerian_Pidgin
* 21:05 tgr: backporting https://gerrit.wikimedia.org/r/#/c/223952/- fixes OAuth which is broken for 1.26wmf13
* 22:37 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:47 gwicke: temporarily disabled puppet on cassandra nodes while tweaking settings
* 22:33 volans@cumin2002: START - Cookbook sre.dns.netbox
* 19:53 legoktm: manually fixing global merge of Yuvipanda->YuviPanda (T104686)
* 21:18 thcipriani@deploy1002: Finished scap: noop test (duration: 04m 07s)
* 19:04 gwicke: bounced cassandra on restbase1004
* 21:14 thcipriani@deploy1002: Started scap: noop test
* 18:29 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf13
* 21:10 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:805433{{!}}CommonSettings: clean up and simplify some code]] (duration: 03m 42s)
* 17:54 gwicke: bounced restbase on restbase1005
* 21:06 thcipriani@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:806249{{!}}MWRealm.php: remove unused getRealmSpecificFilename() (T171115)]] (duration: 03m 35s)
* 17:32 ori: installed poolcounter on mw1154
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:31 logmsgbot: ori Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 12s)
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:22 cmjohnson1: shutting down helium for a few minutes to move within the same row
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:53 gwicke: bounced cassandra on restbase1004
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:48 godog: reboot ms-be2013 T105213
* 20:59 thcipriani@deploy1002: Finished scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]] (duration: 16m 57s)
* 16:38 gwicke: bounced cassandra on restbase1006
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:07 _joe_: repooling mw1152
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:57 godog: restart cassandra on restbase1002
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:34 gwicke: bounced cassandra on restbase1004
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:24 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223739/ (duration: 00m 12s)
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:23 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223737/ (duration: 00m 12s)
* 20:42 thcipriani@deploy1002: Started scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]]
* 15:23 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223742/ (duration: 00m 12s)
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:09 gwicke: bounced cassandra on restbase1004
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:44 gwicke: re-enabled compaction throttling (60mb/s) on cassandra nodes
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:44 bblack: reprepro: jessie-wikimedia/backports openssl pkg, 1.0.2c-1 => 1.0.2d-1~wmf1
* 20:27 cjming@deploy1002: Synchronized phpcs.xml: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 27s)
* 14:29 _joe_: reimaging mw1152 for wiping any leftover local hacks. Depooling, scheduling downtime
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:28 moritzm: installed python-django security updates on labmon, netmon and californium
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:24 godog: really upgrade python-django on graphite2001
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 mobrovac: restbase cassandra rolling restart to apply https://gerrit.wikimedia.org/r/223774
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:02 godog: upgrade python-django on graphite1001 and graphite2001 following  http://www.ubuntu.com/usn/usn-2671-1/
* 20:23 cjming@deploy1002: Synchronized wmf-config/: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 22s)
* 11:34 godog: restart cassandra on restbase1001
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:22 logmsgbot: krinkle Synchronized php-1.26wmf13/resources/src/mediawiki/mediawiki.util.js: T105265 (duration: 00m 11s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:21 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/GlobalFunctions.php: T105265 (duration: 00m 12s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:09 mobrovac: restbase deploying https://gerrit.wikimedia.org/r/#/c/223297/ which bumps the back-end module version ( https://github.com/wikimedia/restbase-mod-table-cassandra/pull/117 )
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:53 mobrovac: restbase started thinner 15 days for wikimedia group
* 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805179{{!}}Turn off TOC A/B test for pilot wikis (T309683)]] (duration: 03m 37s)
* 10:37 mark: Shutdown AMS-IX route server BGP sessions on cr1-esams
* 19:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner2001.codfw.wmnet
* 07:48 logmsgbot: oblivian Synchronized php-1.26wmf13/thumb.php: Re-add fix for thumb.php 404s on HHVM (duration: 00m 13s)
* 19:39 aokoth@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 06:27 twentyafterfour: restarted apache2 on iridium to fix phab exception
* 19:23 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 06:15 springle: db1037 is repartitioning tables; it will lag intermittently for a day
* 19:03 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner2001.codfw.wmnet
* 06:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 06:05:30 UTC 2015 (duration 5m 29s)
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab-runner1001.eqiad.wmnet
* 05:23 gwicke: dynamically limited cassandra compaction throughput to 80mb/s; please review https://gerrit.wikimedia.org/r/#/c/223722/ to make this permanent
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 03:01:13+00:00
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:58 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 05m 29s)
* 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 02:42 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:42:56+00:00
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29904 and previous config saved to /var/cache/conftool/dbconfig/20220616-185520-marostegui.json
* 02:40 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 02:40:16 UTC 2015 (duration 40m 15s)
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:36 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 32s)
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 twentyafterfour: restarted phd
* 18:54 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 02:28 twentyafterfour: moved phd log to free disk space on iridium
* 18:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
* 02:24 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 02:24:00+00:00
* 18:53 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:17 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:17:02+00:00
* 18:50 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 47s)
* 18:49 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 02:00 springle: pkg upgrade and restart db1037
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:49 gwicke: switched remaining cassandra nodes to JDK8
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 11s)
* 18:44 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to all wikis
* 01:07 mutante: uranium - deleted apache logs older than 90 days
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:45 RoanKattouw: Running populateContentModel.php --wiki=cawiki --table=revision --ns=5
* 18:42 brennen@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/CheckUser/src/Hooks.php: Backport: [[gerrit:806246{{!}}Only try to create User object if username is not null (T310747)]] (duration: 03m 23s)
* 00:20 RoanKattouw: Ran populateContentModel.php --table=revision for odd-numbered namespaces on officewiki for T105245
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29903 and previous config saved to /var/cache/conftool/dbconfig/20220616-184015-marostegui.json
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29902 and previous config saved to /var/cache/conftool/dbconfig/20220616-182510-marostegui.json
* 18:13 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:11 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29901 and previous config saved to /var/cache/conftool/dbconfig/20220616-181005-marostegui.json
* 18:10 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 17:59 brennen: end of phabricator deploy
* 17:46 brennen: starting phabricator deploy, momentary downtime expected while services restart
* 17:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29900 and previous config saved to /var/cache/conftool/dbconfig/20220616-173738-marostegui.json
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29899 and previous config saved to /var/cache/conftool/dbconfig/20220616-173725-marostegui.json
* 17:31 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:31 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:27 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:26 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29898 and previous config saved to /var/cache/conftool/dbconfig/20220616-172220-marostegui.json
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29897 and previous config saved to /var/cache/conftool/dbconfig/20220616-170715-marostegui.json
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29896 and previous config saved to /var/cache/conftool/dbconfig/20220616-165210-marostegui.json
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29895 and previous config saved to /var/cache/conftool/dbconfig/20220616-161844-marostegui.json
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29894 and previous config saved to /var/cache/conftool/dbconfig/20220616-161835-marostegui.json
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29893 and previous config saved to /var/cache/conftool/dbconfig/20220616-160330-marostegui.json
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29892 and previous config saved to /var/cache/conftool/dbconfig/20220616-154825-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29891 and previous config saved to /var/cache/conftool/dbconfig/20220616-153320-marostegui.json
* 15:31 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 15:30 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 15:29 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P29890 and previous config saved to /var/cache/conftool/dbconfig/20220616-151434-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P29889 and previous config saved to /var/cache/conftool/dbconfig/20220616-145931-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29888 and previous config saved to /var/cache/conftool/dbconfig/20220616-145136-marostegui.json
* 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29887 and previous config saved to /var/cache/conftool/dbconfig/20220616-145128-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P29886 and previous config saved to /var/cache/conftool/dbconfig/20220616-144427-ladsgroup.json
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29885 and previous config saved to /var/cache/conftool/dbconfig/20220616-143623-marostegui.json
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P29884 and previous config saved to /var/cache/conftool/dbconfig/20220616-142923-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29883 and previous config saved to /var/cache/conftool/dbconfig/20220616-142118-marostegui.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29882 and previous config saved to /var/cache/conftool/dbconfig/20220616-140613-marostegui.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29881 and previous config saved to /var/cache/conftool/dbconfig/20220616-140453-root.json
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 volans@cumin1001: dbctl commit (dc=all): 'Doesn't have new wikiuser', diff saved to https://phabricator.wikimedia.org/P29880 and previous config saved to /var/cache/conftool/dbconfig/20220616-140107-volans.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29879 and previous config saved to /var/cache/conftool/dbconfig/20220616-134950-root.json
* 13:45 sukhe: upload bird2_2.0.7-4.1wm1 to apt.wm.o (buster) - [[phab:T310574|T310574]]
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29878 and previous config saved to /var/cache/conftool/dbconfig/20220616-133446-root.json
* 13:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1089.eqiad.wmnet
* 13:22 jayme@cumin1001: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 13:21 jayme@cumin1001: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29877 and previous config saved to /var/cache/conftool/dbconfig/20220616-131942-root.json
* 13:10 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29876 and previous config saved to /var/cache/conftool/dbconfig/20220616-130438-root.json
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 13:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29875 and previous config saved to /var/cache/conftool/dbconfig/20220616-123357-marostegui.json
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1008.eqiad.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 for schema change', diff saved to https://phabricator.wikimedia.org/P29874 and previous config saved to /var/cache/conftool/dbconfig/20220616-115924-root.json
* 11:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1008.eqiad.wmnet
* 11:53 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet
* 11:45 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet
* 11:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
* 11:38 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
* 11:35 godog: trim swift logs older than 25d from centrallog hosts - [[phab:T309171|T309171]]
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet
* 11:27 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet
* 11:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
* 11:19 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29873 and previous config saved to /var/cache/conftool/dbconfig/20220616-111632-marostegui.json
* 11:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
* 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 11:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json
* 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
* 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
* 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
* 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json
* 10:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster
* 10:02 elukey: ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster
* 09:21 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json
* 09:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:02 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 08:45 moritzm: failover ganeti master in drmrs/2 to ganeti6004
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370{{!}}testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 joal: Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure


== July 8 ==
== 2022-06-15 ==
* 23:07 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow: SWAT (duration: 00m 14s)
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json
* 23:06 bd808: Restarted logstash on logstash1001; no hhvm input seen for last hour
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json
* 22:56 gwicke: finished rolling restart of cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/223495/
* 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS buster
* 22:45 mutante: zirconium - stop puppet for role switch
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29865 and previous config saved to /var/cache/conftool/dbconfig/20220615-221834-marostegui.json
* 22:33 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/changes/EnhancedChangesList.php: Unbreak missing flags in enhanced RC (duration: 00m 12s)
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS buster
* 22:08 logmsgbot: hoo Synchronized php-1.26wmf13/extensions/Wikidata/: Update Wikibase: Fix JavaScript ULS usage (duration: 00m 20s)
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 21:51 logmsgbot: manybubbles Synchronized php-1.26wmf12/extensions/CirrusSearch/: Stop some fatals in cirrus (duration: 00m 13s)
* 22:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 21:41 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert Count API module instantiations and Hook runs (2/2) (duration: 00m 12s)
* 22:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 21:40 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/Hooks.php: Revert Count API module instantiations and Hook runs (1/2) (duration: 00m 12s)
* 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 21:39 logmsgbot: bd808 Synchronized php-1.26wmf13/extensions/CirrusSearch/includes/CirrusSearch.php: Suppress interwiki results when they would break (duration: 00m 12s)
* 22:12 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 21:08 bblack: graphite: wiped /var/log/upstart/statsite* logs, restarted statsite processes
* 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 20:56 csteipp: deployed patches for T103022 & T103023
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29864 and previous config saved to /var/cache/conftool/dbconfig/20220615-220329-marostegui.json
* 20:53 csteipp: deployed patch for T94116 for wmf12/wmf13
* 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 20:30 gwicke: added explicit exit 1 in /etc/init.d/cassandra on restbase1008 to prevent cassandra from starting up there; is puppet restarting it?
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 20:29 subbu: deployed parsoid sha c4cfc527
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 20:15 gwicke: bounced cassandra on restbase1001
* 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 20:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 20:05:09 UTC 2015 (duration 5m 8s)
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29863 and previous config saved to /var/cache/conftool/dbconfig/20220615-213241-marostegui.json
* 19:32 gwicke: stopped cassandra on restbase1008
* 21:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 19:27 logmsgbot: twentyafterfour Synchronized php-1.26wmf13: deploying UniversalLanguageSelector commit 2e0990ac9879 (duration: 01m 58s)
* 21:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 19:26 urandom: restbase rolling restart
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29862 and previous config saved to /var/cache/conftool/dbconfig/20220615-213233-marostegui.json
* 18:21 jgage: ran 'kafka preferred-replica-election' to promote analytics1021 back to Leader
* 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29861 and previous config saved to /var/cache/conftool/dbconfig/20220615-211728-marostegui.json
* 18:05 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf13
* 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29860 and previous config saved to /var/cache/conftool/dbconfig/20220615-210223-marostegui.json
* 17:16 moritzm: installed libwmf security updates on various systems
* 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29859 and previous config saved to /var/cache/conftool/dbconfig/20220615-204717-marostegui.json
* 17:09 gwicke: bounced cassandra on restbase1004
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:25 mutante: handing over adminship of the "test" mailman list to John F. Lewis (was: Thehelpfulone) due to inactivity
* 20:08 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804014{{!}}Remove unused setting wgQuickSurveysUseVue (T285890)]] (duration: 03m 38s)
* 13:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db1041 load (duration: 00m 13s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:58 paravoid: manually dpkg -P ferm on potassium
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:52 paravoid: rmmod all iptables/netfilter-related modules from potassium
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:23 godog: bounce cassandra on restbase1004, heap space
* 19:50 hashar@deploy1002: Finished deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]] (duration: 00m 10s)
* 11:12 _joe_: mw1153 passed the smoke tests, repooling
* 19:50 hashar@deploy1002: Started deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]]
* 11:08 godog: bounce cassandra on restbase1004 and restbase1005 'cannot achieve consistency level quorum'
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29858 and previous config saved to /var/cache/conftool/dbconfig/20220615-194703-marostegui.json
* 10:50 godog: bounce cassandra on restbase1004, death by compaction
* 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:43 ori: _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:42 ori: Nuked /var/lib/carbon/whisper/ResourceLoader on graphite[12]001. Data prior to rollout of I55f0c44cd considered bogus.
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29857 and previous config saved to /var/cache/conftool/dbconfig/20220615-194655-marostegui.json
* 09:42 ori: morebots, are you OK?
* 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29856 and previous config saved to /var/cache/conftool/dbconfig/20220615-193150-marostegui.json
* 09:41 godog: bounce nutcracker on silver
* 19:31 hashar: wikibugs IRC bot has been restarted by valhallasw \o/ # [[phab:T310734|T310734]]
* 09:33 _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29855 and previous config saved to /var/cache/conftool/dbconfig/20220615-191645-marostegui.json
* 09:26 hashar: upgraded plugins on jenkins and restarting it
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29854 and previous config saved to /var/cache/conftool/dbconfig/20220615-190140-marostegui.json
* 09:06 hashar: Jenkins registering jobs with Zuul
* 18:42 hashar: wikibugs (irc bot for Phabricator/Gerrit) is no more working and would need a restart [[phab:T310734|T310734]]
* 08:41 hashar: Jenkins is migrating old build histories. Lot of disk IO happening
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:11 hashar: shutdowning Jenkins for upgrade.
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29853 and previous config saved to /var/cache/conftool/dbconfig/20220615-182140-marostegui.json
* 05:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 05:57:10 UTC 2015 (duration 57m 9s)
* 18:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 05:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1041, warm up (duration: 00m 13s)
* 18:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-08 02:31:24+00:00
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:16 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-08 02:16:50+00:00
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 48s)
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]] (duration: 03m 43s)
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1015.eqiad.wmnet with OS buster
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:52 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1014.eqiad.wmnet with OS buster
* 17:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29851 and previous config saved to /var/cache/conftool/dbconfig/20220615-172738-marostegui.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29849 and previous config saved to /var/cache/conftool/dbconfig/20220615-171233-marostegui.json
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29848 and previous config saved to /var/cache/conftool/dbconfig/20220615-165727-marostegui.json
* 16:54 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to group0
* 16:44 jynus: reestarting replication for m3 on db1117, not db2078
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29847 and previous config saved to /var/cache/conftool/dbconfig/20220615-164222-marostegui.json
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:29 brennen: phabricator upgrade finished
* 16:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Id8cdb8aef70f6672}} (duration: 03m 41s)
* 16:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host backup1009.eqiad.wmnet
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host backup1009.eqiad.wmnet
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29845 and previous config saved to /var/cache/conftool/dbconfig/20220615-160838-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29844 and previous config saved to /var/cache/conftool/dbconfig/20220615-160830-marostegui.json
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS buster
* 15:56 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P29843 and previous config saved to /var/cache/conftool/dbconfig/20220615-155325-marostegui.json
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 15:40 mutante: phabricator upgrade in progress
* 15:39 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:39 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20220615-153820-marostegui.json
* 15:35 brennen: starting phabricator deploy, momentary downtime expected while Apache restarts and migrations run
* 15:34 jynus: stopping replication for m3 on db1117, db2078
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29841 and previous config saved to /var/cache/conftool/dbconfig/20220615-152315-marostegui.json
* 15:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ms-be1059.eqiad.wmnet with OS bullseye
* 15:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:20 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:06 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:05 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:05 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:03 mutante: phabricator maintenance about to start
* 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 14:59 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:52 jbond@cumin1001: END (ERROR) - Cookbook sre.pdus.uptime (exit_code=97)
* 14:51 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29840 and previous config saved to /var/cache/conftool/dbconfig/20220615-145028-marostegui.json
* 14:50 urandom: ALTER-ing replication for codfw (Cassandra) expansion -- [[phab:T307641|T307641]]
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29839 and previous config saved to /var/cache/conftool/dbconfig/20220615-145020-marostegui.json
* 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:49 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:46 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29838 and previous config saved to /var/cache/conftool/dbconfig/20220615-143515-marostegui.json
* 14:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:30 hnowlan@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T308670|T308670]] credentials to access the similar-users service (duration: 03m 32s)
* 14:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:22 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:21 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29836 and previous config saved to /var/cache/conftool/dbconfig/20220615-142010-marostegui.json
* 14:19 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 14:16 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS buster
* 14:10 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jnuche@deploy1002: Installation of scap version "4.9.4" completed for 558 hosts
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 14:08 jnuche@deploy1002: Installing scap version "4.9.4" for 558 hosts
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29834 and previous config saved to /var/cache/conftool/dbconfig/20220615-140505-marostegui.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:01 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:58 awight: EU afternoon backport window complete.
* 13:57 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/Translate/src/PageTranslation/DeleteTranslatableBundleSpecialPage.php: Backport: [[gerrit:805749{{!}}Fix deletion of translation pages outside of NS_MAIN namespace (T310440)]] (duration: 00m 32s)
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29833 and previous config saved to /var/cache/conftool/dbconfig/20220615-135508-root.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29832 and previous config saved to /var/cache/conftool/dbconfig/20220615-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29831 and previous config saved to /var/cache/conftool/dbconfig/20220615-135458-root.json
* 13:54 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:53 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:49 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29830 and previous config saved to /var/cache/conftool/dbconfig/20220615-134004-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29829 and previous config saved to /var/cache/conftool/dbconfig/20220615-133958-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29828 and previous config saved to /var/cache/conftool/dbconfig/20220615-133954-root.json
* 13:38 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:805745{{!}}Restore internal mechanism to use either back or close button (T310602)]] (duration: 00m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29827 and previous config saved to /var/cache/conftool/dbconfig/20220615-133334-marostegui.json
* 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29826 and previous config saved to /var/cache/conftool/dbconfig/20220615-133326-marostegui.json
* 13:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 01m 08s)
* 13:30 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 02m 06s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29825 and previous config saved to /var/cache/conftool/dbconfig/20220615-132500-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29824 and previous config saved to /var/cache/conftool/dbconfig/20220615-132454-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29823 and previous config saved to /var/cache/conftool/dbconfig/20220615-132450-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29822 and previous config saved to /var/cache/conftool/dbconfig/20220615-131820-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29821 and previous config saved to /var/cache/conftool/dbconfig/20220615-130956-root.json
* 13:09 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29820 and previous config saved to /var/cache/conftool/dbconfig/20220615-130951-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29819 and previous config saved to /var/cache/conftool/dbconfig/20220615-130946-root.json
* 13:08 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:04 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 43s)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29818 and previous config saved to /var/cache/conftool/dbconfig/20220615-130315-marostegui.json
* 13:02 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 12:56 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 58s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:55 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 05s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29817 and previous config saved to /var/cache/conftool/dbconfig/20220615-125452-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29816 and previous config saved to /var/cache/conftool/dbconfig/20220615-125447-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29815 and previous config saved to /var/cache/conftool/dbconfig/20220615-125442-root.json
* 12:51 jbond@deploy1002: Finished deploy [netbox/deploy@7bbf659]: log (duration: 03m 12s)
* 12:48 jbond@deploy1002: Started deploy [netbox/deploy@7bbf659]: log
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29813 and previous config saved to /var/cache/conftool/dbconfig/20220615-124810-marostegui.json
* 12:42 moritzm: failover ganeti master in eqsin to ganeti5001
* 12:42 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:42 volans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29812 and previous config saved to /var/cache/conftool/dbconfig/20220615-123949-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29811 and previous config saved to /var/cache/conftool/dbconfig/20220615-123943-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29810 and previous config saved to /var/cache/conftool/dbconfig/20220615-123938-root.json
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 12:25 kart_: Updated cxserver to 2022-06-15-074244-production ([[phab:T309266|T309266]], [[phab:T310116|T310116]], [[phab:T309384|T309384]], [[phab:T306963|T306963]])
* 12:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 12:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 es1033 es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29808 and previous config saved to /var/cache/conftool/dbconfig/20220615-122123-root.json
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 12:19 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29807 and previous config saved to /var/cache/conftool/dbconfig/20220615-121620-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29806 and previous config saved to /var/cache/conftool/dbconfig/20220615-121440-marostegui.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 12:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29805 and previous config saved to /var/cache/conftool/dbconfig/20220615-115935-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29804 and previous config saved to /var/cache/conftool/dbconfig/20220615-115452-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29803 and previous config saved to /var/cache/conftool/dbconfig/20220615-115135-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29802 and previous config saved to /var/cache/conftool/dbconfig/20220615-115127-root.json
* 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29801 and previous config saved to /var/cache/conftool/dbconfig/20220615-114950-marostegui.json
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29800 and previous config saved to /var/cache/conftool/dbconfig/20220615-114430-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29799 and previous config saved to /var/cache/conftool/dbconfig/20220615-113948-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29798 and previous config saved to /var/cache/conftool/dbconfig/20220615-113631-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29797 and previous config saved to /var/cache/conftool/dbconfig/20220615-113623-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29796 and previous config saved to /var/cache/conftool/dbconfig/20220615-113445-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29795 and previous config saved to /var/cache/conftool/dbconfig/20220615-112924-marostegui.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29794 and previous config saved to /var/cache/conftool/dbconfig/20220615-112444-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29793 and previous config saved to /var/cache/conftool/dbconfig/20220615-112127-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29792 and previous config saved to /var/cache/conftool/dbconfig/20220615-112119-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29791 and previous config saved to /var/cache/conftool/dbconfig/20220615-111940-marostegui.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29790 and previous config saved to /var/cache/conftool/dbconfig/20220615-110940-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29789 and previous config saved to /var/cache/conftool/dbconfig/20220615-110623-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29788 and previous config saved to /var/cache/conftool/dbconfig/20220615-110616-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29787 and previous config saved to /var/cache/conftool/dbconfig/20220615-110435-marostegui.json
* 10:55 marostegui: dbmaint es3@eqiad [[phab:T310485|T310485]]
* 10:55 marostegui: dbmaint es2@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui: dbmaint es1@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29786 and previous config saved to /var/cache/conftool/dbconfig/20220615-105437-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29784 and previous config saved to /var/cache/conftool/dbconfig/20220615-105119-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29783 and previous config saved to /var/cache/conftool/dbconfig/20220615-105112-root.json
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29782 and previous config saved to /var/cache/conftool/dbconfig/20220615-103933-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29781 and previous config saved to /var/cache/conftool/dbconfig/20220615-103615-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29780 and previous config saved to /var/cache/conftool/dbconfig/20220615-103608-root.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29779 and previous config saved to /var/cache/conftool/dbconfig/20220615-103101-marostegui.json
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29778 and previous config saved to /var/cache/conftool/dbconfig/20220615-103048-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1028 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29777 and previous config saved to /var/cache/conftool/dbconfig/20220615-102929-root.json
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29776 and previous config saved to /var/cache/conftool/dbconfig/20220615-101543-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29775 and previous config saved to /var/cache/conftool/dbconfig/20220615-100235-marostegui.json
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29774 and previous config saved to /var/cache/conftool/dbconfig/20220615-100037-marostegui.json
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29773 and previous config saved to /var/cache/conftool/dbconfig/20220615-094532-marostegui.json
* 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29772 and previous config saved to /var/cache/conftool/dbconfig/20220615-092706-marostegui.json
* 09:20 marostegui: Reboot sanitarium hosts (db1154, db1155) wiki replicas will have lag
* 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1059.eqiad.wmnet with OS bullseye
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29771 and previous config saved to /var/cache/conftool/dbconfig/20220615-091257-marostegui.json
* 09:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29770 and previous config saved to /var/cache/conftool/dbconfig/20220615-091249-marostegui.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29769 and previous config saved to /var/cache/conftool/dbconfig/20220615-091201-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29768 and previous config saved to /var/cache/conftool/dbconfig/20220615-085744-marostegui.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29767 and previous config saved to /var/cache/conftool/dbconfig/20220615-085656-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29766 and previous config saved to /var/cache/conftool/dbconfig/20220615-084239-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29765 and previous config saved to /var/cache/conftool/dbconfig/20220615-084151-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29764 and previous config saved to /var/cache/conftool/dbconfig/20220615-084046-marostegui.json
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29763 and previous config saved to /var/cache/conftool/dbconfig/20220615-083554-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29762 and previous config saved to /var/cache/conftool/dbconfig/20220615-082734-marostegui.json
* 08:23 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:22 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29761 and previous config saved to /var/cache/conftool/dbconfig/20220615-082050-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29760 and previous config saved to /var/cache/conftool/dbconfig/20220615-081744-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29759 and previous config saved to /var/cache/conftool/dbconfig/20220615-080546-root.json
* 08:03 XioNoX: re-enable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29758 and previous config saved to /var/cache/conftool/dbconfig/20220615-080240-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29757 and previous config saved to /var/cache/conftool/dbconfig/20220615-075042-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29756 and previous config saved to /var/cache/conftool/dbconfig/20220615-075024-marostegui.json
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29755 and previous config saved to /var/cache/conftool/dbconfig/20220615-074736-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29754 and previous config saved to /var/cache/conftool/dbconfig/20220615-073538-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29753 and previous config saved to /var/cache/conftool/dbconfig/20220615-073232-root.json
* 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29752 and previous config saved to /var/cache/conftool/dbconfig/20220615-072352-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P29751 and previous config saved to /var/cache/conftool/dbconfig/20220615-072034-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29750 and previous config saved to /var/cache/conftool/dbconfig/20220615-071728-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29749 and previous config saved to /var/cache/conftool/dbconfig/20220615-070847-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29748 and previous config saved to /var/cache/conftool/dbconfig/20220615-065342-marostegui.json
* 06:52 XioNoX: disable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29747 and previous config saved to /var/cache/conftool/dbconfig/20220615-063837-marostegui.json
* 06:02 marostegui: Reboot db[2071-2078] [[phab:T310485|T310485]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29746 and previous config saved to /var/cache/conftool/dbconfig/20220615-060153-marostegui.json
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29745 and previous config saved to /var/cache/conftool/dbconfig/20220615-054252-marostegui.json
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1173.eqiad.wmnet with OS bullseye
* 05:17 marostegui: dbmaint es5@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es4@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es3@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es2@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es1@codfw [[phab:T310485|T310485]]
* 05:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:03 marostegui: Reboot dbproxy1016 and dbproxy1021 [[phab:T310484|T310484]]
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 tstarling@deploy1002: Synchronized php-1.39.0-wmf.16/includes/cache/MessageCache.php: (no justification provided) (duration: 03m 36s)
* 02:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:17 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/includes/cache/MessageCache.php: [[phab:T310532|T310532]] (duration: 03m 29s)
* 02:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== July 7 ==
== 2022-06-14 ==
* 23:54 jgage: kafka brokers 1018 & 1021 were demoted; i have triggered a leader election and they are leaders again
* 23:52 mutante: gitlab-runner1001/1002 - clean revert not possible, icinga alerting about failed buildkitd service, manually deleting systemd unit and trying to clean up [[phab:T308271|T308271]]
* 23:05 logmsgbot: catrope Synchronized visualeditor-default.dblist: Enable VE by default on labswiki (duration: 00m 12s)
* 23:49 mutante: gitlab-runner1002 - systemctl restart docker; run-puppet-agent ; systemctl start buildkitd  - fails though [[phab:T308271|T308271]]
* 21:56 hoo: Restarted hhvm on mw1003 "Fatal error: Function already defined: wmfLoadInitialiseSettings in /srv/mediawiki/wmf-config/CommonSettings.php on line 187"
* 23:39 mutante: gitlab-runner1001 - systemctl start buildkitd
* 21:16 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/resourceloader/ResourceLoader.php: T104769 (duration: 00m 13s)
* 23:32 mutante: gitlab-runner1001 - restarting docker
* 20:53 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf13
* 23:08 mutante: disabling puppet in gitlab-runners (via cumin /disable-puppet) before deploying gerrit:791655 to provide gitlab-runners with buildkit and new docker network - [[phab:T308271|T308271]]
* 20:00 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf13 and rebuild l10n cache (duration: 39m 41s)
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:47 gwicke: restarted cassandra on restbase1005
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:20 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf13 and rebuild l10n cache
* 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:15 moritzm: installed PHP security updates on all trusty hosts
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:58 ejegg: updated payments from a17ee221db0dbde70c92e24fc188379b6dbad613 to ec34ebf61e5962f66b807abdcb519ff323d41e8e
* 22:15 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|e3fe6c04c95717f0f914bbfa366f5f827f392b6b}}: phpcs: fix more SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 39s)
* 18:08 twentyafterfour: restarted apache2 on iridium (phab hotfix)
* 22:05 urbanecm@deploy1002: Synchronized w/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 18s)
* 17:10 robh: OTRS update appears to be functioning normally. As such, ending maintenance window.
* 22:02 urbanecm@deploy1002: Synchronized src/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 32s)
* 17:06 robh: otrs is now using the new sha256 cert
* 22:00 mutante: wtp1026 - manually running '/usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807'
* 17:00 robh: starting otrs maint window
* 21:58 urbanecm@deploy1002: Synchronized rpc/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 31s)
* 16:58 _joe_: restarted HHVM on mw1026, near to OOM
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:47 twentyafterfour: applied hotfix for phabricator bug: https://secure.phabricator.com/D13544
* 21:54 urbanecm@deploy1002: Synchronized multiversion/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 29s)
* 16:36 mutante: protactinium - manual iptables rules replaced by puppet/ferm rules
* 21:54 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 16:11 logmsgbot: thcipriani Synchronized php-1.26wmf12/extensions/ContentTranslation/extension.json: Remove default value for ContentTranslationCampaigns (duration: 00m 12s)
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:33 jynus: manually editing table mediawiki.ipblocks to fully solve a former software bug
* 21:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:12 Jeff_Green: ptr records for frack/codfw and authdns-update
* 21:51 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 38s)
* 15:10 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable ContentTranslation in enwiki [[gerrit:222991]] (duration: 00m 13s)
* 21:49 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
* 14:21 jynus: dropping optin_survey_old table from enwiki
* 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:23 akosiaris: restarting gitblit on antimony
* 21:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
* 11:31 mobrovac: restbase restarted cassandra on rb1005
* 21:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
* 11:26 godog: restart cassandra on restbase1004, heap exhausted
* 21:38 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 10:49 godog: restarted cassandra on restbase1005, mutations through the roof
* 21:32 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 08:27 godog: set operations/puppet/cassandra git submodule repo as hidden
* 21:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 06:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul  7 06:11:46 UTC 2015 (duration 11m 45s)
* 21:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 05:51 logmsgbot: krinkle Synchronized php-1.26wmf12/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.js: I3e965dda1c4 (duration: 00m 12s)
* 21:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-07 02:27:55+00:00
* 21:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 09s)
* 21:10 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 01:12 ori: Re-pooled mw1152 at 20:46 UTC, did not log it then.
* 21:03 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 00:41 springle: upgrade db1041 trusty
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:37 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/CentralAuth/includes/CreateLocalAccountJob.php: https://gerrit.wikimedia.org/r/#/c/223211/ (duration: 00m 13s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 urbanecm@deploy1002: Synchronized docroot/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 41s)
* 20:37 urbanecm@deploy1002: Synchronized w/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 15s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 urbanecm@deploy1002: Synchronized multiversion/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 28s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 20:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 20:31 urbanecm@deploy1002: Synchronized wmf-config/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 38s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1021.eqiad.wmnet with OS buster
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS buster
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1018.eqiad.wmnet with OS buster
* 20:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1017.eqiad.wmnet with OS buster
* 19:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 19:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 19:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 19:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 19:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 19:32 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 19:32 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 19:16 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
* 19:10 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
* 18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 18:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:15 ayounsi@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=imagescaler-ro,name=codfw
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
* 17:57 brennen@deploy1002: Pruned MediaWiki: 1.39.0-wmf.14 (duration: 01m 53s)
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.16 (duration: 32m 52s)
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:25 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
* 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.16
* 17:13 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): train is blocked - will sync to testwikis and hold there for resolution of [[phab:T310532|T310532]]
* 16:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2053.codfw.wmnet with OS bullseye
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
* 16:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
* 16:12 jnuche@deploy1002: Installation of scap version "4.9.2" completed for 557 hosts
* 16:11 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:05 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:58 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2053.codfw.wmnet with OS bullseye
* 15:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host elastic2053.codfw.wmnet
* 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host elastic2053.codfw.wmnet
* 15:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
* 14:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:53 moritzm: failover ganeti master in ulsfo to ganeti4003
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|596058b5e4d906d40e620fe5b01f37c484f5a8c1}}: Add new throttle rule + remove expired one ([[phab:T310625|T310625]]) (duration: 03m 38s)
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:33 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
* 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
* 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS buster
* 14:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS buster
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS buster
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS buster
* 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS buster
* 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 14:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS buster
* 14:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS buster
* 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS buster
* 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29741 and previous config saved to /var/cache/conftool/dbconfig/20220614-132654-marostegui.json
* 13:13 urbanecm: UTC afternoon B&C window done
* 13:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1692de09bf04c724cf416679405d4b6485550d40}}: Disable DiscussionTools visualenhancements feature in production (duration: 03m 25s)
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29740 and previous config saved to /var/cache/conftool/dbconfig/20220614-131149-marostegui.json
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f2dc7296f0c25d00e45651c50c3e45733cc63b3}}: Make new topic tool available as opt-out almost everywhere (phrase 4; [[phab:T310392|T310392]]) (duration: 03m 45s)
* 13:06 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29739 and previous config saved to /var/cache/conftool/dbconfig/20220614-125644-marostegui.json
* 12:56 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS buster
* 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS buster
* 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS buster
* 12:42 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS buster
* 12:41 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS buster
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29738 and previous config saved to /var/cache/conftool/dbconfig/20220614-124139-marostegui.json
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS buster
* 12:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS buster
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS buster
* 12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS buster
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29737 and previous config saved to /var/cache/conftool/dbconfig/20220614-120323-marostegui.json
* 12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29735 and previous config saved to /var/cache/conftool/dbconfig/20220614-115020-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P29734 and previous config saved to /var/cache/conftool/dbconfig/20220614-113515-marostegui.json
* 11:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1173.eqiad.wmnet with OS bullseye
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29732 and previous config saved to /var/cache/conftool/dbconfig/20220614-110945-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29731 and previous config saved to /var/cache/conftool/dbconfig/20220614-110504-marostegui.json
* 11:02 moritzm: rebalancing ganeti cluster in esams [[phab:T308238|T308238]]
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
* 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29730 and previous config saved to /var/cache/conftool/dbconfig/20220614-105441-root.json
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 10:44 joal@deploy1002: Finished deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency (duration: 00m 09s)
* 10:44 joal@deploy1002: Started deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29729 and previous config saved to /var/cache/conftool/dbconfig/20220614-104021-marostegui.json
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29728 and previous config saved to /var/cache/conftool/dbconfig/20220614-103937-root.json
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:30 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
* 10:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29727 and previous config saved to /var/cache/conftool/dbconfig/20220614-102433-root.json
* 10:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
* 10:19 marostegui: dbmaint s6@eqiad [[phab:T60674|T60674]]
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29726 and previous config saved to /var/cache/conftool/dbconfig/20220614-101755-marostegui.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29725 and previous config saved to /var/cache/conftool/dbconfig/20220614-100930-root.json
* 10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS buster
* 10:03 moritzm: rename Ganeti group row_A in test cluster to row_A-test
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29724 and previous config saved to /var/cache/conftool/dbconfig/20220614-100250-marostegui.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29723 and previous config saved to /var/cache/conftool/dbconfig/20220614-094745-marostegui.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29722 and previous config saved to /var/cache/conftool/dbconfig/20220614-093240-marostegui.json
* 09:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:23 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29721 and previous config saved to /var/cache/conftool/dbconfig/20220614-092330-marostegui.json
* 09:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29720 and previous config saved to /var/cache/conftool/dbconfig/20220614-092322-marostegui.json
* 09:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 09:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 09:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 09:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
* 09:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 09:08 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29719 and previous config saved to /var/cache/conftool/dbconfig/20220614-090817-marostegui.json
* 09:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
* 09:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 09:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 09:01 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
* 09:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
* 08:58 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
* 08:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1001.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
* 08:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS buster
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29718 and previous config saved to /var/cache/conftool/dbconfig/20220614-085312-marostegui.json
* 08:53 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63] (duration: 07m 27s)
* 08:51 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:49 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
* 08:48 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 08:48 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:47 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
* 08:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
* 08:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 08:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63]
* 08:45 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63] (duration: 00m 08s)
* 08:44 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63]
* 08:44 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63] (duration: 04m 45s)
* 08:39 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63]
* 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
* 08:38 godog: reboot centrallog2002 - [[phab:T310483|T310483]]
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29717 and previous config saved to /var/cache/conftool/dbconfig/20220614-083807-marostegui.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29716 and previous config saved to /var/cache/conftool/dbconfig/20220614-082855-marostegui.json
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29715 and previous config saved to /var/cache/conftool/dbconfig/20220614-082847-marostegui.json
* 08:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:20 marostegui: dbmaint s6@eqiad [[phab:T298560|T298560]]
* 08:18 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:16 marostegui: dbmaint s6@eqiad [[phab:T309311|T309311]]
* 08:16 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63] (duration: 31m 09s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29714 and previous config saved to /var/cache/conftool/dbconfig/20220614-081342-marostegui.json
* 08:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS buster
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29713 and previous config saved to /var/cache/conftool/dbconfig/20220614-075837-marostegui.json
* 07:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29712 and previous config saved to /var/cache/conftool/dbconfig/20220614-074331-marostegui.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29711 and previous config saved to /var/cache/conftool/dbconfig/20220614-073322-marostegui.json
* 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 taavi: UTC morning deploys done
* 07:24 marostegui: dbmaint s6@eqiad [[phab:T298563|T298563]]
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804806{{!}}Enable Realtime Preview on cawiki, viwiki, and fawiki (T303961)]] (duration: 03m 20s)
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:802685{{!}}Update $wgVectorMaxWidthOptions to include action=edit (T307725)]] (duration: 03m 36s)
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 marostegui: dbmaint s6@eqiad [[phab:T300381|T300381]]
* 07:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for schema change', diff saved to https://phabricator.wikimedia.org/P29710 and previous config saved to /var/cache/conftool/dbconfig/20220614-065322-root.json
* 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:28 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] (duration: 03m 31s)
* 06:27 marostegui: Reboot dbproxy1012 and dbproxy1015 [[phab:T310484|T310484]]
* 06:24 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/includes/ServiceWiring.php: [[phab:T212129|T212129]] (duration: 03m 33s)
* 06:20 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/extension.json: [[phab:T212129|T212129]] (duration: 03m 32s)
* 06:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29709 and previous config saved to /var/cache/conftool/dbconfig/20220614-060608-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29707 and previous config saved to /var/cache/conftool/dbconfig/20220614-060155-root.json
* 06:01 marostegui: Starting s6 eqiad failover from db1173 to db1131 - [[phab:T300471|T300471]]
* 05:11 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] Switch wgMainStash to db-mainstash (duration: 03m 38s)
* 05:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29706 and previous config saved to /var/cache/conftool/dbconfig/20220614-045224-root.json
* 04:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 04:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29705 and previous config saved to /var/cache/conftool/dbconfig/20220614-024047-ladsgroup.json
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29704 and previous config saved to /var/cache/conftool/dbconfig/20220614-022542-ladsgroup.json
* 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29703 and previous config saved to /var/cache/conftool/dbconfig/20220614-021037-ladsgroup.json
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29702 and previous config saved to /var/cache/conftool/dbconfig/20220614-015532-ladsgroup.json
* 00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29701 and previous config saved to /var/cache/conftool/dbconfig/20220614-003608-marostegui.json
* 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29700 and previous config saved to /var/cache/conftool/dbconfig/20220614-002103-marostegui.json
* 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29699 and previous config saved to /var/cache/conftool/dbconfig/20220614-000558-marostegui.json


== July 6 ==
== 2022-06-13 ==
* 23:50 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 12s)
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29698 and previous config saved to /var/cache/conftool/dbconfig/20220613-235053-marostegui.json
* 23:49 logmsgbot: krenair Synchronized w/static/images/project-logos/mrwikisource.png: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 13s)
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:35 logmsgbot: krenair Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/223179/ - should be labs-only (duration: 00m 12s)
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:32 logmsgbot: krenair Synchronized README: https://gerrit.wikimedia.org/r/#/c/222941/ - ... (duration: 00m 13s)
* 23:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:27 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221809/ - should be a noop, just doc changes (duration: 00m 13s)
* 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:25 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221808/ (duration: 00m 13s)
* 23:45 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 801836 remove variable wmgDbconfigFromEtcd (duration: 03m 26s)
* 23:17 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/223185/ (duration: 00m 12s)
* 23:35 tstarling@deploy1002: Synchronized wmf-config/etcd.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 36s)
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220970/ (duration: 00m 14s)
* 23:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:46 gwicke: restarted cassandra instance on restbase1003; was low on memory and constantly writing small chunks
* 23:30 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 30s)
* 21:30 andrewbogott: rebooting labvirt1005, again. Somehow virtualization is turned off again
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:12 subbu: deployed parsoid version 87a746e6
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:04 logmsgbot: ori Synchronized php-1.26wmf12/thumb.php: cdc75debaf: Add Content-Length header to thumb.php error responses (duration: 00m 13s)
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:02 mutante: purging static-bz URL on varnish ...
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29697 and previous config saved to /var/cache/conftool/dbconfig/20220613-232537-marostegui.json
* 20:39 akosiaris: upload php5_5.3.10-1ubuntu3.19-wmf1 on apt.wikimedia.org/precise-wikimedia
* 23:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:15 gwicke: restart cassandra instance on 1005
* 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:04 mobrovac: restbase restart cassandra on rb1005
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29696 and previous config saved to /var/cache/conftool/dbconfig/20220613-232529-marostegui.json
* 19:28 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/223040/ (duration: 00m 12s)
* 23:16 mutante: gitlab-runner2001 - systemctl reset-failed to clear alert about failed ifup for ens14 which is actually up. race condiation caused by reboot
* 19:11 gwicke: reduced compaction throughput from 160 to 100 mb/s across the cassandra cluster via 'nodetool -h <host> setcompactionthroughput 100'
* 23:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29695 and previous config saved to /var/cache/conftool/dbconfig/20220613-231024-marostegui.json
* 18:51 gwicke: restarted cassandra on restbase1001 with jdk8, see T104888
* 22:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29694 and previous config saved to /var/cache/conftool/dbconfig/20220613-225519-marostegui.json
* 18:22 gwicke: restarted cassandra on restbase1004 with jdk8
* 22:55 AndyRussG: payments-wiki upgraded from {{Gerrit|8c6208c2}} to {{Gerrit|10304f69}}
* 17:54 Jeff_Green: authdns-update for new rigel A record
* 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29693 and previous config saved to /var/cache/conftool/dbconfig/20220613-224014-marostegui.json
* 17:42 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: increase db2029 traffic to normal levels (duration: 00m 12s)
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29692 and previous config saved to /var/cache/conftool/dbconfig/20220613-221522-marostegui.json
* 17:37 gwicke: upgraded restbase1005 to jdk8
* 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:35 gwicke: restarting cassandra instance on restbase1005: out of heap
* 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:10 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade(2/2) (duration: 00m 11s)
* 22:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 17:09 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade (duration: 00m 11s)
* 22:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 16:38 jynus: upgrade and restart of db2029
* 21:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 16:35 ori: depooled mw1152
* 21:56 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 15:29 logmsgbot: krenair Finished scap: https://gerrit.wikimedia.org/r/#/c/222993/ (duration: 22m 09s)
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 15:21 _joe_: repooling mw1152
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 15:20 _joe_: attempting dump-apc on mw1060
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 15:09 _joe_: depooled the HHVM imagescaler again
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 15:07 logmsgbot: krenair Started scap: https://gerrit.wikimedia.org/r/#/c/222993/
* 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29691 and previous config saved to /var/cache/conftool/dbconfig/20220613-215118-marostegui.json
* 15:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222617/ (duration: 00m 12s)
* 21:48 mutante: gitlab-runner* - sequentially pausing, rebooting, resuming one by one
* 14:48 moritzm: installed python security updates on analytics*, lab* and virt*
* 21:44 mutante: gitlab-runner1001 - pause from accepting jobs - rebooting
* 14:46 moritzm: added python-diskimage-builder 0.1.46-1+wmf1 for jessie-wikimedia on carbon
* 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29690 and previous config saved to /var/cache/conftool/dbconfig/20220613-213613-marostegui.json
* 14:43 _joe_: depooled the HHVM imagescaler, spitting 503s again.
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29689 and previous config saved to /var/cache/conftool/dbconfig/20220613-212108-marostegui.json
* 14:18 mobrovac: restbase started thinning out parsoid data (local_group_wikipedia_T_parsoid_dataDVIsgzJSne8k) for >= 22 days
* 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29688 and previous config saved to /var/cache/conftool/dbconfig/20220613-210603-marostegui.json
* 14:07 YuviPanda: restart apache on labcontrol1001 to pick up parser function change
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:57 moritzm: installed python security updates on mw*, es* and db*
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:18 logmsgbot: hoo Synchronized wmf-config/: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata (duration: 00m 13s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:15 logmsgbot: hoo Finished scap: Update WikibaseQuality and WikibaseQualityConstraint (duration: 25m 56s)
* 20:29 cjming: end of UTC late backport window
* 11:49 logmsgbot: hoo Started scap: Update WikibaseQuality and WikibaseQualityConstraint
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 hoo: Created the `wbqc_constraints` table on wikidatawiki
* 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:805206{{!}}Disable TOC A/B test for beta cluster (T309683)]] (duration: 03m 29s)
* 09:02 _joe_: restarted the appserver on mw1059 with hhvm.server.apc.expire_on_sets = true, restarted the heap profiling to confirm my hypothesis on T104769
* 20:22 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 30s)
* 08:31 _joe_: restarted cassandra on rb1004. again.
* 20:19 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
* 05:01 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1034, depool db1041 (duration: 00m 12s)
* 20:18 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ug.svg: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 36s)
* 05:00 springle: stash/pull/apply CommonSettings.php on tin, which was left with modifications
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul  6 04:35:45 UTC 2015 (duration 35m 44s)
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-06 02:22:12+00:00
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 07s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29687 and previous config saved to /var/cache/conftool/dbconfig/20220613-201420-marostegui.json
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29686 and previous config saved to /var/cache/conftool/dbconfig/20220613-201407-marostegui.json
* 20:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 27s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:08 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-crh.svg: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 16s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29685 and previous config saved to /var/cache/conftool/dbconfig/20220613-195902-marostegui.json
* 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29684 and previous config saved to /var/cache/conftool/dbconfig/20220613-194356-marostegui.json
* 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29683 and previous config saved to /var/cache/conftool/dbconfig/20220613-192851-marostegui.json
* 19:12 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 19:12 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 19:11 mutante: etherpad - minimal downtime - rebooting etherpad1003
* 19:07 mutante: gerrit2002 - rebooting
* 19:04 mutante: gitlab2003 - rebooting
* 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29682 and previous config saved to /var/cache/conftool/dbconfig/20220613-190314-marostegui.json
* 19:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 19:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 19:01 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
* 18:55 mutante: gitlab2002 - rebooting
* 18:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29681 and previous config saved to /var/cache/conftool/dbconfig/20220613-184015-marostegui.json
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29680 and previous config saved to /var/cache/conftool/dbconfig/20220613-182510-marostegui.json
* 18:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29679 and previous config saved to /var/cache/conftool/dbconfig/20220613-181005-marostegui.json
* 17:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1146.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29678 and previous config saved to /var/cache/conftool/dbconfig/20220613-175500-marostegui.json
* 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1145.eqiad.wmnet with OS buster
* 17:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1143.eqiad.wmnet with OS buster
* 17:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1148.eqiad.wmnet with OS buster
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 17:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2002.codfw.wmnet with OS buster
* 17:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
* 17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1147.eqiad.wmnet with OS buster
* 17:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29677 and previous config saved to /var/cache/conftool/dbconfig/20220613-171438-marostegui.json
* 17:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29676 and previous config saved to /var/cache/conftool/dbconfig/20220613-171430-marostegui.json
* 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3001.esams.wmnet with OS bullseye
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1144.eqiad.wmnet with OS buster
* 17:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS buster
* 17:03 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
* 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29675 and previous config saved to /var/cache/conftool/dbconfig/20220613-165925-marostegui.json
* 16:58 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
* 16:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS buster
* 16:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1146.eqiad.wmnet with OS buster
* 16:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 16:53 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1145.eqiad.wmnet with OS buster
* 16:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29674 and previous config saved to /var/cache/conftool/dbconfig/20220613-164419-marostegui.json
* 16:40 dancy@deploy1002: prep aborted:  (duration: 01m 40s)
* 16:38 dancy@deploy1002: prep aborted:  (duration: 06m 12s)
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
* 16:32 marostegui: dbmaint x2@eqiad upgrade and reboot all x2 db hosts [[phab:T310485|T310485]]
* 16:32 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti3001.esams.wmnet with OS bullseye
* 16:32 dancy@deploy1002: prep aborted:  (duration: 00m 26s)
* 16:31 marostegui: Reboot all codfw parsercache hosts [[phab:T310485|T310485]]
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29673 and previous config saved to /var/cache/conftool/dbconfig/20220613-162914-marostegui.json
* 16:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS buster
* 16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2001.codfw.wmnet with OS buster
* 16:10 robh: ganeti3001 rebooting and reimaging for firmware updates via [[phab:T308238|T308238]]
* 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 27s)
* 15:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:47 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 35s)
* 15:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS buster
* 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29672 and previous config saved to /var/cache/conftool/dbconfig/20220613-152900-marostegui.json
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29671 and previous config saved to /var/cache/conftool/dbconfig/20220613-152852-marostegui.json
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29670 and previous config saved to /var/cache/conftool/dbconfig/20220613-151347-marostegui.json
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 15:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
* 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29669 and previous config saved to /var/cache/conftool/dbconfig/20220613-145842-marostegui.json
* 14:58 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29668 and previous config saved to /var/cache/conftool/dbconfig/20220613-144337-marostegui.json
* 14:42 marostegui: Failover m1 and m2 to a different proxy [[phab:T310484|T310484]]
* 14:38 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:34 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29667 and previous config saved to /var/cache/conftool/dbconfig/20220613-141802-marostegui.json
* 14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29666 and previous config saved to /var/cache/conftool/dbconfig/20220613-141754-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29665 and previous config saved to /var/cache/conftool/dbconfig/20220613-140249-marostegui.json
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 13:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dumpsdata1007.eqiad.wmnet
* 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1057.eqiad.wmnet with OS bullseye
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29663 and previous config saved to /var/cache/conftool/dbconfig/20220613-134744-marostegui.json
* 13:45 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
* 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply