You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135)
imported>Stashbot
(sukhe: disable puppet on dns4003 till we resolve the puppet failures)
 
(63 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 2022-07-30 ==
== 2022-10-05 ==
* 01:44 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures
* 01:44 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2028.codfw.wmnet with OS bullseye
* 00:55 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye


== 2022-07-29 ==
== 2022-10-04 ==
* 22:43 Krinkle: krinkle@mwmaint1002$ mwscript findBadBlobs.php nlwiktionary; mark 2371 blobs from May 2004 as "Invalid gzip, [[phab:T265989|T265989]]"
* 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 22:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2041.codfw.wmnet with OS bullseye
* 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 22:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2041.codfw.wmnet with reason: host reimage
* 21:28 cjming: end of UTC late backport window
* 22:17 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2041.codfw.wmnet with reason: host reimage
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:09 Krinkle: findBadBlobs.php nlwiktionary --revisions 22 --mark 'Invalid gzip, [[phab:T265989|T265989]]'
* 21:25 cjming@deploy1002: Finished scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] (duration: 05m 06s)
* 22:01 mutante: phab1001 - rsync -avp --bwlimit=1000 /srv/repos/ rsync://phab1004.eqiad.wmnet/phabricator-srv-repos (running slowly inside a screen
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
*
* 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32047 and previous config saved to /var/cache/conftool/dbconfig/20220728-131329-marostegui.json
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 awight@deploy1002: awight and stang: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806931{{!}}Configure wbsearchentities profile parameter on Wikidata (T307869)]] (duration: 03m 25s)
* 13:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32045 and previous config saved to /var/cache/conftool/dbconfig/20220728-130314-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32044 and previous config saved to /var/cache/conftool/dbconfig/20220728-125823-marostegui.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2174 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P32043 and previous config saved to /var/cache/conftool/dbconfig/20220728-125253-
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:11 TimStarling: restarted php7.2-fpm on the 9 canary hosts in eqiad [[phab:T313770|T313770]]


== 2022-07-25 ==
== 2022-10-03 ==
* 22:54 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
* 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31900 and previous config saved to /var/cache/conftool/dbconfig/20220725-224153-ladsgroup.json
* 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31899 and previous config saved to /var/cache/conftool/dbconfig/20220725-222648-ladsgroup.json
* 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 22:11 ladsgroup@
* 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
* 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
* 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
* 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
* 19:22


== 2022-07-24 ==
== 2022-10-02 ==
* 20:54 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
* 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition
* 20:37 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
* 14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31815 and previous config saved to /var/cache/conftool/dbconfig/20220724-100221-ladsgroup.json
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31814 and previous config saved to /var/cache/conftool/dbconfig/20220724-094716-ladsgroup.json
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31813 and previous config saved to /var/cache/conftool/dbconfig/20220724-093211-ladsgroup.json
* 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31812 and previous config saved to /var/cache/conftool/dbconfig/20220724-091706-ladsgroup.json
* 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31811 and previous config saved to /var/cache/conftool/dbconfig/20220724-041542-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31810 and previous config saved to /var/cache/conftool/dbconfig/20220724-040037-ladsgroup.json
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31809 and previous config saved to /var/cache/conftool/dbconfig/20220724-034532-ladsgroup.json
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31808 and previous config saved to /var/cache/conftool/dbconfig/20220724-034356-ladsgroup.json
* 03:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 03:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31807 and previous config saved to /var/cache/conftool/dbconfig/20220724-034336-ladsgroup.json
* 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31806 and previous config saved to /var/cache/conftool/dbconfig/20220724-033027-ladsgroup.json
* 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31805 and previous config saved to /var/cache/conftool/dbconfig/20220724-032831-ladsgroup.json
* 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31804 and previous config saved to /var/cache/conftool/dbconfig/20220724-031326-ladsgroup.json
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31803 and previous config saved to /var/cache/conftool/dbconfig/20220724-025820-ladsgroup.json
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31802 and previous config saved to /var/cache/conftool/dbconfig/20220724-003718-ladsgroup.json
* 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31801 and previous config saved to /var/cache/conftool/dbconfig/20220724-003652-ladsgroup.json
* 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31800 and previous config saved to /var/cache/conftool/dbconfig/20220724-002147-ladsgroup.json
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31799 and previous config saved to /var/cache/conftool/dbconfig/20220724-000641-ladsgroup.json


== 2022-07-23 ==
== 2022-10-01 ==
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31798 and previous config saved to /var/cache/conftool/dbconfig/20220723-235136-ladsgroup.json
* 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
* 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31797 and previous config saved to /var/cache/conftool/dbconfig/20220723-232948-ladsgroup.json
* 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
* 23:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
* 23:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)
* 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31796 and previous config saved to /var/cache/conftool/dbconfig/20220723-232927-ladsgroup.json
* 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31795 and previous config saved to /var/cache/conftool/dbconfig/20220723-231422-ladsgroup.json
* 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31794 and previous config saved to /var/cache/conftool/dbconfig/20220723-225917-ladsgroup.json
* 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31793 and previous config saved to /var/cache/conftool/dbconfig/20220723-224412-ladsgroup.json
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31792 and previous config saved to /var/cache/conftool/dbconfig/20220723-220740-ladsgroup.json
* 22:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31791 and previous config saved to /var/cache/conftool/dbconfig/20220723-220720-ladsgroup.json
* 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31790 and previous config saved to /var/cache/conftool/dbconfig/20220723-215215-ladsgroup.json
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31789 and previous config saved to /var/cache/conftool/dbconfig/20220723-213710-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31788 and previous config saved to /var/cache/conftool/dbconfig/20220723-213610-ladsgroup.json
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31787 and previous config saved to /var/cache/conftool/dbconfig/20220723-212204-ladsgroup.json
* 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31786 and previous config saved to /var/cache/conftool/dbconfig/20220723-212105-ladsgroup.json
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31785 and previous config saved to /var/cache/conftool/dbconfig/20220723-210559-ladsgroup.json
* 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31784 and previous config saved to /var/cache/conftool/dbconfig/20220723-205054-ladsgroup.json
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31783 and previous config saved to /var/cache/conftool/dbconfig/20220723-204049-ladsgroup.json
* 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31782 and previous config saved to /var/cache/conftool/dbconfig/20220723-164105-ladsgroup.json
* 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31781 and previous config saved to /var/cache/conftool/dbconfig/20220723-164045-ladsgroup.json
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31780 and previous config saved to /var/cache/conftool/dbconfig/20220723-162540-ladsgroup.json
* 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31779 and previous config saved to /var/cache/conftool/dbconfig/20220723-161035-ladsgroup.json
* 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31778 and previous config saved to /var/cache/conftool/dbconfig/20220723-155530-ladsgroup.json
* 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 15:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31777 and previous config saved to /var/cache/conftool/dbconfig/20220723-155311-ladsgroup.json
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31776 and previous config saved to /var/cache/conftool/dbconfig/20220723-153805-ladsgroup.json
* 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31775 and previous config saved to /var/cache/conftool/dbconfig/20220723-152300-ladsgroup.json
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31774 and previous config saved to /var/cache/conftool/dbconfig/20220723-151951-ladsgroup.json
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31773 and previous config saved to /var/cache/conftool/dbconfig/20220723-151930-ladsgroup.json
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31772 and previous config saved to /var/cache/conftool/dbconfig/20220723-150754-ladsgroup.json
* 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31771 and previous config saved to /var/cache/conftool/dbconfig/20220723-150425-ladsgroup.json
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31770 and previous config saved to /var/cache/conftool/dbconfig/20220723-144920-ladsgroup.json
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31769 and previous config saved to /var/cache/conftool/dbconfig/20220723-143414-ladsgroup.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31768 and previous config saved to /var/cache/conftool/dbconfig/20220723-105825-ladsgroup.json
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31767 and previous config saved to /var/cache/conftool/dbconfig/20220723-105805-ladsgroup.json
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31766 and previous config saved to /var/cache/conftool/dbconfig/20220723-105257-ladsgroup.json
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31765 and previous config saved to /var/cache/conftool/dbconfig/20220723-105238-ladsgroup.json
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31764 and previous config saved to /var/cache/conftool/dbconfig/20220723-105228-ladsgroup.json
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31763 and previous config saved to /var/cache/conftool/dbconfig/20220723-104300-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31762 and previous config saved to /var/cache/conftool/dbconfig/20220723-103733-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31761 and previous config saved to /var/cache/conftool/dbconfig/20220723-103723-ladsgroup.json
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31760 and previous config saved to /var/cache/conftool/dbconfig/20220723-102755-ladsgroup.json
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31759 and previous config saved to /var/cache/conftool/dbconfig/20220723-102227-ladsgroup.json
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31758 and previous config saved to /var/cache/conftool/dbconfig/20220723-102218-ladsgroup.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31757 and previous config saved to /var/cache/conftool/dbconfig/20220723-101250-ladsgroup.json
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31756 and previous config saved to /var/cache/conftool/dbconfig/20220723-100722-ladsgroup.json
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31755 and previous config saved to /var/cache/conftool/dbconfig/20220723-100713-ladsgroup.json
* 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31754 and previous config saved to /var/cache/conftool/dbconfig/20220723-095241-ladsgroup.json
* 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31753 and previous config saved to /var/cache/conftool/dbconfig/20220723-053604-ladsgroup.json
* 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31752 and previous config saved to /var/cache/conftool/dbconfig/20220723-052925-ladsgroup.json
* 05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31751 and previous config saved to /var/cache/conftool/dbconfig/20220723-015300-ladsgroup.json
* 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31750 and previous config saved to /var/cache/conftool/dbconfig/20220723-013755-ladsgroup.json
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31749 and previous config saved to /var/cache/conftool/dbconfig/20220723-012250-ladsgroup.json
* 01:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31748 and previous config saved to /var/cache/conftool/dbconfig/20220723-010745-ladsgroup.json
* 00:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
* 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
* 00:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31747 and previous config saved to /var/cache/conftool/dbconfig/20220723-001125-ladsgroup.json


== 2022-07-22 ==
== 2022-09-30 ==
* 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31746 and previous config saved to /var/cache/conftool/dbconfig/20220722-235619-ladsgroup.json
* 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31745 and previous config saved to /var/cache/conftool/dbconfig/20220722-234114-ladsgroup.json
* 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31744 and previous config saved to /var/cache/conftool/dbconfig/20220722-232609-ladsgroup.json
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31743 and previous config saved to /var/cache/conftool/dbconfig/20220722-215349-ladsgroup.json
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31742 and previous config saved to /var/cache/conftool/dbconfig/20220722-215329-ladsgroup.json
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31741 and previous config saved to /var/cache/conftool/dbconfig/20220722-213824-ladsgroup.json
* 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31740 and previous config saved to /var/cache/conftool/dbconfig/20220722-212319-ladsgroup.json
* 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31739 and previous config saved to /var/cache/conftool/dbconfig/20220722-211308-ladsgroup.json
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31738 and previous config saved to /var/cache/conftool/dbconfig/20220722-211259-ladsgroup.json
* 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
* 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31737 and previous config saved to /var/cache/conftool/dbconfig/20220722-210813-ladsgroup.json
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
* 21:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 29s)
* 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 21:04 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31736 and previous config saved to /var/cache/conftool/dbconfig/20220722-205754-ladsgroup.json
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
* 20:44 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 07s)
* 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 20:44 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
* 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31735 and previous config saved to /var/cache/conftool/dbconfig/20220722-204248-ladsgroup.json
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
* 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
* 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 13:51 moritzm: installing puppetdb-test2001 [[phab:T318931|T318931]]
* 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31734 and previous config saved to /var/cache/conftool/dbconfig/20220722-203708-ladsgroup.json
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31733 and previous config saved to /var/cache/conftool/dbconfig/20220722-202743-ladsgroup.json
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31732 and previous config saved to /var/cache/conftool/dbconfig/20220722-202203-ladsgroup.json
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31731 and previous config saved to /var/cache/conftool/dbconfig/20220722-200658-ladsgroup.json
* 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31730 and previous config saved to /var/cache/conftool/dbconfig/20220722-195153-ladsgroup.json
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31729 and previous config saved to /var/cache/conftool/dbconfig/20220722-194428-ladsgroup.json
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
* 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
* 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
* 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31727 and previous config saved to /var/cache/conftool/dbconfig/20220722-173218-ladsgroup.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
* 17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
* 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
* 16:54 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts (duration: 08m 47s)
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
* 16:45 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
* 16:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
* 16:02 jbond: puppet-agent to puppet7 component
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
* 15:57 jbond: ruby-semantic-puppet to puppet7 component
* 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
* 15:49 jbond: ruby-sorted-set to puppet7 component
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
* 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2046.codfw.wmnet with OS bullseye
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31725 and previous config saved to /var/cache/conftool/dbconfig/20220722-150727-ladsgroup.json
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
* 15:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
* 15:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31724 and previous config saved to /var/cache/conftool/dbconfig/20220722-150707-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
* 15:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
* 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
* 15:03 jbond: ruby-rbtree to puppet7 component
* 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
* 15:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31722 and previous config saved to /var/cache/conftool/dbconfig/20220722-145201-ladsgroup.json
* 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31721 and previous config saved to /var/cache/conftool/dbconfig/20220722-144734-ladsgroup.json
* 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2046.codfw.wmnet with OS bullseye
* 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
* 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31720 and previous config saved to /var/cache/conftool/dbconfig/20220722-143655-ladsgroup.json
* 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31719 and previous config saved to /var/cache/conftool/dbconfig/20220722-143229-ladsgroup.json
* 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
* 14:29 moritzm: restarting tomcat on idp-test.w.o
* 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31718 and previous config saved to /var/cache/conftool/dbconfig/20220722-142150-ladsgroup.json
* 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31717 and previous config saved to /var/cache/conftool/dbconfig/20220722-141724-ladsgroup.json
* 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
* 13:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2033.codfw.wmnet with OS bullseye
* 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
* 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
* 13:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
* 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31713 and previous config saved to /var/cache/conftool/dbconfig/20220722-131710-ladsgroup.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
* 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
* 13:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
* 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
* 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31712 and previous config saved to /var/cache/conftool/dbconfig/20220722-131650-ladsgroup.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
* 13:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31711 and previous config saved to /var/cache/conftool/dbconfig/20220722-130145-ladsgroup.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
* 12:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
* 12:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
* 12:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
* 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31710 and previous config saved to /var/cache/conftool/dbconfig/20220722-124640-ladsgroup.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
* 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
* 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31708 and previous config saved to /var/cache/conftool/dbconfig/20220722-102452-ladsgroup.json
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
* 10:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:21 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
* 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
* 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31707 and previous config saved to /var/cache/conftool/dbconfig/20220722-100948-ladsgroup.json
* 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
* 10:06 XioNoX: push pfw policies - [[phab:T313522|T313522]]
* 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31706 and previous config saved to /var/cache/conftool/dbconfig/20220722-095444-ladsgroup.json
* 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31705 and previous config saved to /var/cache/conftool/dbconfig/20220722-093940-ladsgroup.json
* 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31704 and previous config saved to /var/cache/conftool/dbconfig/20220722-093754-ladsgroup.json
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31702 and previous config saved to /var/cache/conftool/dbconfig/20220722-093453-ladsgroup.json
* 09:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31701 and previous config saved to /var/cache/conftool/dbconfig/20220722-084647-ladsgroup.json
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31700 and previous config saved to /var/cache/conftool/dbconfig/20220722-084627-ladsgroup.json
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 12 hosts with reason: Maintenance
* 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 12 hosts with reason: Maintenance
* 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31697 and previous config saved to /var/cache/conftool/dbconfig/20220722-080112-ladsgroup.json
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31696 and previous config saved to /var/cache/conftool/dbconfig/20220722-074844-ladsgroup.json
* 07:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 07:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 06:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS bullseye
* 06:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
* 05:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS bullseye
* 05:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
* 05:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
* 05:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2021.codfw.wmnet with OS bullseye
* 05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 04:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31694 and previous config saved to /var/cache/conftool/dbconfig/20220722-045543-ladsgroup.json
* 04:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31693 and previous config saved to /var/cache/conftool/dbconfig/20220722-044038-ladsgroup.json
* 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31692 and previous config saved to /var/cache/conftool/dbconfig/20220722-042533-ladsgroup.json
* 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31691 and previous config saved to /var/cache/conftool/dbconfig/20220722-041028-ladsgroup.json
* 04:05 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: disable debug log on test2wiki (cleanup) (duration: 03m 05s)
* 04:01 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I9051d20cd1}} (duration: 03m 02s)
* 03:58 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I9051d20cd1}} (duration: 03m 10s)
* 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31690 and previous config saved to /var/cache/conftool/dbconfig/20220722-031014-ladsgroup.json
* 03:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31689 and previous config saved to /var/cache/conftool/dbconfig/20220722-030954-ladsgroup.json
* 03:09 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable debug log on test2wiki (duration: 02m 47s)
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31688 and previous config saved to /var/cache/conftool/dbconfig/20220722-025449-ladsgroup.json
* 02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31687 and previous config saved to /var/cache/conftool/dbconfig/20220722-023943-ladsgroup.json
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31685 and previous config saved to /var/cache/conftool/dbconfig/20220722-002622-ladsgroup.json
* 00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31684 and previous config saved to /var/cache/conftool/dbconfig/20220722-002601-ladsgroup.json
* 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31683 and previous config saved to /var/cache/conftool/dbconfig/20220722-001056-ladsgroup.json


== 2022-07-21 ==
== 2022-09-29 ==
* 23:53 mutante: https://policy.wikimedia.org moved from Wordpress DNS back to WMF DNS - now redirects to https://wikimediafoundation.org/advocacy/ as requested on [[phab:T310738|T310738]] {{!}} this might also resolve [[phab:T132104|T132104]] or not because wikimediafoundation.org is also on wordpress VIP
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31680 and previous config saved to /var/cache/conftool/dbconfig/20220721-234045-ladsgroup.json
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 23:22 mutante: [cumin2002:~] $ sudo cumin 'C:profile::httpbb' "rm /srv/deployment/httpbb-tests/appserver/test_search.yaml"
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 23:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2045.codfw.wmnet with OS bullseye
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 22:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 22:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:43 sukhe: alert1001: restart icinga
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31678 and previous config saved to /var/cache/conftool/dbconfig/20220721-223048-ladsgroup.json
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:30 mutante: re-enabling puppet on all remaining 'C:profile::mediawiki::httpd'
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31677 and previous config saved to /var/cache/conftool/dbconfig/20220721-221543-ladsgroup.json
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:09 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2045.codfw.wmnet with OS bullseye
* 21:26 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
* 21:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:02 dancy@deploy1002: Installation of scap version "4.11.3" completed for 559 hosts
* 21:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:02 dancy@deploy1002: Installing scap version "4.11.3" for 559 hosts
* 21:18 ejegg: payments-wiki upgraded from {{Gerrit|839d6dde}} to {{Gerrit|aeee9676}}
* 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31676 and previous config saved to /var/cache/conftool/dbconfig/20220721-220038-ladsgroup.json
* 21:14 robh@cumin2002: START - Cookbook sre.dns.netbox
* 21:56 mutante: re-enabling puppet on mw2 in groups (codfw)
* 21:14 brennen: end of utc late backport and config window
* 21:48 mutante: re-enabling puppet on parsoid (wtp*)
* 21:14 brennen@deploy1002: Finished scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] (duration: 08m 22s)
* 21:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31675 and previous config saved to /var/cache/conftool/dbconfig/20220721-214532-ladsgroup.json
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31674 and previous config saved to /var/cache/conftool/dbconfig/20220721-213246-ladsgroup.json
* 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31673 and previous config saved to /var/cache/conftool/dbconfig/20220721-213237-ladsgroup.json
* 21:17 mutante: puppet re-enabled on mw-api-canary and parsoid-canary
* 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31672 and previous config saved to /var/cache/conftool/dbconfig/20220721-211732-ladsgroup.json
* 20:52 mutante: deploying apache config change on cluster, slowly..puppet disabled on C:profile::mediawiki::httpd .. then re-enabling starting with mwdebug.. using httpbb to test it.. then re-enabling puppet on more hosts  https://gerrit.wikimedia.org/r/c/operations/puppet/+/809324  Bug: [[phab:T310738|T310738]]
* 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31669 and previous config saved to /var/cache/conftool/dbconfig/20220721-204518-ladsgroup.json
* 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 20:39 mutante: disabling puppet on mw appservers to deploy gerrit:809324 - [[phab:T310738|T310738]]
* 20:34 cjming: end of UTC late backport window
* 20:34 bd808: Proof of life for stashbot processing !logs
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 andrewbogott: testing the log by logging a test
* 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31668 and previous config saved to /var/cache/conftool/dbconfig/20220721-202348-ladsgroup.json
* 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31667 and previous config saved to /var/cache/conftool/dbconfig/20220721-202311-ladsgroup.json
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31666 and previous config saved to /var/cache/conftool/dbconfig/20220721-200806-ladsgroup.json
* 19:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 19:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31665 and previous config saved to /var/cache/conftool/dbconfig/20220721-195301-ladsgroup.json
* 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31664 and previous config saved to /var/cache/conftool/dbconfig/20220721-193756-ladsgroup.json
* 19:35 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 05s)
* 19:35 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 19:34 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 05s)
* 19:34 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 19:31 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS bullseye
* 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31662 and previous config saved to /var/cache/conftool/dbconfig/20220721-191136-ladsgroup.json
* 19:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2066.codfw.wmnet with reason: host reimage
* 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31661 and previous config saved to /var/cache/conftool/dbconfig/20220721-185631-ladsgroup.json
* 18:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 18:42 tzatziki: running extensions/SecurePoll/cli/wm-scripts/bv2022/populateEditCount.php on all 8 sections
* 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31660 and previous config saved to /var/cache/conftool/dbconfig/20220721-184126-ladsgroup.json
* 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31659 and previous config saved to /var/cache/conftool/dbconfig/20220721-183723-ladsgroup.json
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 18:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31658 and previous config saved to /var/cache/conftool/dbconfig/20220721-183703-ladsgroup.json
* 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:34 dancy@deploy1002: Finished scap: Backport for [[gerrit:816022]] MWConfigCacheGenerator.php: Use grace period of 3 minutes (duration: 03m 39s)
* 18:31 dancy@deploy1002: Started scap: Backport for [[gerrit:816022]] MWConfigCacheGenerator.php: Use grace period of 3 minutes
* 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31656 and previous config saved to /var/cache/conftool/dbconfig/20220721-182033-ladsgroup.json
* 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31655 and previous config saved to /var/cache/conftool/dbconfig/20220721-182013-ladsgroup.json
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:14 brennen: testing scap deployment to phab2001, this is a no-op for production services
* 18:12 brennen@deploy1002: Finished deploy [phabricator/deployment@358bb3a]: (no justification provided) (duration: 01m 17s)
* 18:11 brennen@deploy1002: Started deploy [phabricator/deployment@358bb3a]: (no justification provided)
* 18:10 tzatziki: creating tables for board election with bv2022_tables.sql
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:07 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31654 and previous config saved to /var/cache/conftool/dbconfig/20220721-180653-ladsgroup.json
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31653 and previous config saved to /var/cache/conftool/dbconfig/20220721-180508-ladsgroup.json
* 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31652 and previous config saved to /var/cache/conftool/dbconfig/20220721-175147-ladsgroup.json
* 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31651 and previous config saved to /var/cache/conftool/dbconfig/20220721-175003-ladsgroup.json
* 17:42 dwisehaupt: reclone of frdb2003 from frdb1003 is complete. all services back in service.
* 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic207[0-2].*
* 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=no; selector: name=elastic2066.codfw.wmnet
* 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic206[1-9].*
* 17:36 dancy@deploy1002: Synchronized README: Gathering timing info (duration: 03m 09s)
* 17:35 ryankemper@cumin1001: conftool action : GET; selector: name=elastic6*
* 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31650 and previous config saved to /var/cache/conftool/dbconfig/20220721-173458-ladsgroup.json
* 17:30 ryankemper@cumin1001: conftool action : set/weight=10,pooled=yes; selector: name=elastic6*
* 17:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:20 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:20 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:19 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:17 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:17 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:00 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 16:58 ryankemper: [[phab:T300943|T300943]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/816017 to get conftool-data entries for new elastic2* hosts
* 16:58 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: merging upstream config changes [[phab:T309896|T309896]] - mvernon@cumin1001
* 16:44 ryankemper@cumin1001: conftool action : set/weight=10; selector: name=elastic6*
* 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31649 and previous config saved to /var/cache/conftool/dbconfig/20220721-163859-ladsgroup.json
* 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31648 and previous config saved to /var/cache/conftool/dbconfig/20220721-162458-ladsgroup.json
* 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31647 and previous config saved to /var/cache/conftool/dbconfig/20220721-162419-ladsgroup.json
* 16:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS buster
* 16:09 ryankemper: [[phab:T300943|T300943]] Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/816008 and running puppet twice on elastic20[64-72]
* 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31646 and previous config saved to /var/cache/conftool/dbconfig/20220721-160914-ladsgroup.json
* 16:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS buster
* 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31645 and previous config saved to /var/cache/conftool/dbconfig/20220721-160522-ladsgroup.json
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31644 and previous config saved to /var/cache/conftool/dbconfig/20220721-155409-ladsgroup.json
* 15:50 ryankemper: [[phab:T300943|T300943]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/815823 and running puppet across elastic2* in preparation for adding new codfw hosts into service
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31643 and previous config saved to /var/cache/conftool/dbconfig/20220721-155017-ladsgroup.json
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31642 and previous config saved to /var/cache/conftool/dbconfig/20220721-153904-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31641 and previous config saved to /var/cache/conftool/dbconfig/20220721-153512-ladsgroup.json
* 15:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
* 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
* 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
* 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
* 15:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: [[gerrit:815970{{!}}Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869)]] (2/2) (duration: 03m 13s)
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815970{{!}}Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869)]] (1/2) (duration: 02m 59s)
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31640 and previous config saved to /var/cache/conftool/dbconfig/20220721-152007-ladsgroup.json
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:16 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS buster
* 15:16 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS buster
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Wikibase/repo/: Backport: [[gerrit:815983{{!}}Fix profile in wbsearchentities and wbsearch (T307869)]] (duration: 03m 07s)
* 15:13 moritzm: draining ganeti2021 [[phab:T310483|T310483]]
* 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2014.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 15:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2014.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 14:45 moritzm: upgrading ganeti/eqsin to 3.0.2 [[phab:T312637|T312637]]
* 14:39 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: merging upstream config changes [[phab:T309896|T309896]] - mvernon@cumin1001
* 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31639 and previous config saved to /var/cache/conftool/dbconfig/20220721-143544-ladsgroup.json
* 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31638 and previous config saved to /var/cache/conftool/dbconfig/20220721-143524-ladsgroup.json
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31637 and previous config saved to /var/cache/conftool/dbconfig/20220721-142523-marostegui.json
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1192.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1187.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1195.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1188.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1191.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1185.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1194.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1186.eqiad.wmnet with OS bullseye
* 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1190.eqiad.wmnet with OS bullseye
* 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1193.eqiad.wmnet with OS bullseye
* 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1189.eqiad.wmnet with OS bullseye
* 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31636 and previous config saved to /var/cache/conftool/dbconfig/20220721-142019-ladsgroup.json
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31635 and previous config saved to /var/cache/conftool/dbconfig/20220721-141938-root.json
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1187.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
* 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bullseye
* 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS bullseye
* 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31634 and previous config saved to /var/cache/conftool/dbconfig/20220721-141018-marostegui.json
* 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31633 and previous config saved to /var/cache/conftool/dbconfig/20220721-140513-ladsgroup.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31632 and previous config saved to /var/cache/conftool/dbconfig/20220721-140434-root.json
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31631 and previous config saved to /var/cache/conftool/dbconfig/20220721-140004-ladsgroup.json
* 13:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 13:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1185.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1192.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1187.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1191.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1190.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1188.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1186.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1193.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1194.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1189.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1195.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31630 and previous config saved to /var/cache/conftool/dbconfig/20220721-135513-marostegui.json
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31629 and previous config saved to /var/cache/conftool/dbconfig/20220721-135008-ladsgroup.json
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31628 and previous config saved to /var/cache/conftool/dbconfig/20220721-134250-root.json
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1192.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1193.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1190.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1185.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1187.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1188.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1186.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1191.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1189.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31627 and previous config saved to /var/cache/conftool/dbconfig/20220721-134008-marostegui.json
* 13:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31626 and previous config saved to /var/cache/conftool/dbconfig/20220721-132824-marostegui.json
* 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31625 and previous config saved to /var/cache/conftool/dbconfig/20220721-132746-root.json
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31624 and previous config saved to /var/cache/conftool/dbconfig/20220721-132639-marostegui.json
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 moritzm: installing paramiko security updates
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 Lucas_WMDE: pulled config change {{Gerrit|Iee6de25983}} to mwdebug1001, then reverted in {{Gerrit|I9248270621}} and pulled that too; neither was synced to other hosts
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 moritzm: installing xen security updates
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31623 and previous config saved to /var/cache/conftool/dbconfig/20220721-131040-ladsgroup.json
* 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31622 and previous config saved to /var/cache/conftool/dbconfig/20220721-125108-marostegui.json
* 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31621 and previous config saved to /var/cache/conftool/dbconfig/20220721-123603-marostegui.json
* 12:21 dwisehaupt: started reclone of frdb2003 from frdb1003
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31620 and previous config saved to /var/cache/conftool/dbconfig/20220721-122058-marostegui.json
* 12:07 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31619 and previous config saved to /var/cache/conftool/dbconfig/20220721-120553-marostegui.json
* 12:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 18:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 12:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 18:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31618 and previous config saved to /var/cache/conftool/dbconfig/20220721-115607-marostegui.json
* 11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 11:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31617 and previous config saved to /var/cache/conftool/dbconfig/20220721-114641-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31616 and previous config saved to /var/cache/conftool/dbconfig/20220721-113136-marostegui.json
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2078.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31615 and previous config saved to /var/cache/conftool/dbconfig/20220721-111631-marostegui.json
* 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 11:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 11:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 11:09 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 11:08 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 11:08 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 11:07 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:07 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 11:07 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:03 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2078.codfw.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31614 and previous config saved to /var/cache/conftool/dbconfig/20220721-110126-marostegui.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31613 and previous config saved to /var/cache/conftool/dbconfig/20220721-105856-ladsgroup.json
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubetcd2006.codfw.wmnet with reason: Switch to DRBD, [[phab:T311686|T311686]]
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on kubetcd2006.codfw.wmnet with reason: Switch to DRBD, [[phab:T311686|T311686]]
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31612 and previous config saved to /var/cache/conftool/dbconfig/20220721-104351-ladsgroup.json
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31611 and previous config saved to /var/cache/conftool/dbconfig/20220721-104039-marostegui.json
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31610 and previous config saved to /var/cache/conftool/dbconfig/20220721-104002-marostegui.json
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31609 and previous config saved to /var/cache/conftool/dbconfig/20220721-102846-ladsgroup.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31608 and previous config saved to /var/cache/conftool/dbconfig/20220721-102457-marostegui.json
* 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:18 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
* 10:17 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
* 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:15 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
* 10:15 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
* 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:14 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31607 and previous config saved to /var/cache/conftool/dbconfig/20220721-101341-ladsgroup.json
* 10:11 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
* 10:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31606 and previous config saved to /var/cache/conftool/dbconfig/20220721-100951-marostegui.json
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2009.codfw.wmnet to cluster codfw and group C
* 10:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2009.codfw.wmnet to cluster codfw and group C
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 09:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31605 and previous config saved to /var/cache/conftool/dbconfig/20220721-095454-ladsgroup.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31604 and previous config saved to /var/cache/conftool/dbconfig/20220721-095446-marostegui.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2085 and db2086 from dbctl �[3~', diff saved to https://phabricator.wikimedia.org/P31603 and previous config saved to /var/cache/conftool/dbconfig/20220721-095439-marostegui.json
* 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31602 and previous config saved to /var/cache/conftool/dbconfig/20220721-093755-ladsgroup.json
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 09:32 jbond: enable puppet on A:cp post gerrit:815728
* 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31601 and previous config saved to /var/cache/conftool/dbconfig/20220721-093032-ladsgroup.json
* 09:21 moritzm: installing containerd security updates in Kubernetes eqiad masters
* 09:18 jbond: disable puppet on A:cp for gerrit:815728
* 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31599 and previous config saved to /var/cache/conftool/dbconfig/20220721-091527-ladsgroup.json
* 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31598 and previous config saved to /var/cache/conftool/dbconfig/20220721-090022-ladsgroup.json
* 08:59 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:59 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:57 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:55 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:54 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:53 klausman@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31597 and previous config saved to /var/cache/conftool/dbconfig/20220721-084935-marostegui.json
* 08:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 08:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2169 to s6 and s7 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31595 and previous config saved to /var/cache/conftool/dbconfig/20220721-083147-marostegui.json
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:18 moritzm: installing containerd security updates in Kubernetes eqiad workers
* 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31594 and previous config saved to /var/cache/conftool/dbconfig/20220721-081449-marostegui.json
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31593 and previous config saved to /var/cache/conftool/dbconfig/20220721-075944-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P31592 and previous config saved to /var/cache/conftool/dbconfig/20220721-075757-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31591 and previous config saved to /var/cache/conftool/dbconfig/20220721-075745-root.json
* 07:46 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:815895{{!}}Adding Wikiquote to the new portals (T273179)]] (duration: 03m 10s)
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31590 and previous config saved to /var/cache/conftool/dbconfig/20220721-074439-marostegui.json
* 07:43 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:815895{{!}}Adding Wikiquote to the new portals (T273179)]] (duration: 03m 08s)
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P31589 and previous config saved to /var/cache/conftool/dbconfig/20220721-074253-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31588 and previous config saved to /var/cache/conftool/dbconfig/20220721-074242-root.json
* 07:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31587 and previous config saved to /var/cache/conftool/dbconfig/20220721-073502-ladsgroup.json
* 07:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31586 and previous config saved to /var/cache/conftool/dbconfig/20220721-073251-ladsgroup.json
* 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31585 and previous config saved to /var/cache/conftool/dbconfig/20220721-073217-ladsgroup.json
* 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS bullseye
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31584 and previous config saved to /var/cache/conftool/dbconfig/20220721-072934-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P31583 and previous config saved to /var/cache/conftool/dbconfig/20220721-072749-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31582 and previous config saved to /var/cache/conftool/dbconfig/20220721-072738-root.json
* 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31581 and previous config saved to /var/cache/conftool/dbconfig/20220721-071953-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31580 and previous config saved to /var/cache/conftool/dbconfig/20220721-071932-marostegui.json
* 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2026.codfw.wmnet with reason: host reimage
* 07:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2026.codfw.wmnet with reason: host reimage
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P31579 and previous config saved to /var/cache/conftool/dbconfig/20220721-071245-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31578 and previous config saved to /var/cache/conftool/dbconfig/20220721-071234-root.json
* 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2020.codfw.wmnet to cluster codfw and group B
* 07:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2020.codfw.wmnet to cluster codfw and group B
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31577 and previous config saved to /var/cache/conftool/dbconfig/20220721-070427-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P31576 and previous config saved to /var/cache/conftool/dbconfig/20220721-065741-root.json
* 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS bullseye
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31575 and previous config saved to /var/cache/conftool/dbconfig/20220721-065730-root.json
* 06:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2009.codfw.wmnet with OS bullseye
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31574 and previous config saved to /var/cache/conftool/dbconfig/20220721-064922-marostegui.json
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 06:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P31573 and previous config saved to /var/cache/conftool/dbconfig/20220721-064237-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31572 and previous config saved to /var/cache/conftool/dbconfig/20220721-064226-root.json
* 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2009.codfw.wmnet with reason: host reimage
* 06:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 06:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 06:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2009.codfw.wmnet with reason: host reimage
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31571 and previous config saved to /var/cache/conftool/dbconfig/20220721-063417-marostegui.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P31570 and previous config saved to /var/cache/conftool/dbconfig/20220721-062733-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31569 and previous config saved to /var/cache/conftool/dbconfig/20220721-062722-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31568 and previous config saved to /var/cache/conftool/dbconfig/20220721-062431-marostegui.json
* 06:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 06:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS bullseye
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P31567 and previous config saved to /var/cache/conftool/dbconfig/20220721-061228-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31566 and previous config saved to /var/cache/conftool/dbconfig/20220721-061217-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 [[phab:T313398|T313398]]', diff saved to https://phabricator.wikimedia.org/P31565 and previous config saved to /var/cache/conftool/dbconfig/20220721-061145-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write [[phab:T313398|T313398]]', diff saved to https://phabricator.wikimedia.org/P31564 and previous config saved to /var/cache/conftool/dbconfig/20220721-061001-root.json
* 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 06:08 marostegui: Starting x1 eqiad failover from db1120 to db1103 - [[phab:T313398|T313398]]
* 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P31563 and previous config saved to /var/cache/conftool/dbconfig/20220721-060427-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 primary and set section read-write [[phab:T313383|T313383]]', diff saved to https://phabricator.wikimedia.org/P31562 and previous config saved to /var/cache/conftool/dbconfig/20220721-060112-root.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - [[phab:T313383|T313383]]', diff saved to https://phabricator.wikimedia.org/P31561 and previous config saved to /var/cache/conftool/dbconfig/20220721-060037-marostegui.json
* 06:00 marostegui: Starting s7 eqiad failover from db1181 to db1136 - [[phab:T313383|T313383]]
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 [[phab:T313398|T313398]]', diff saved to https://phabricator.wikimedia.org/P31560 and previous config saved to /var/cache/conftool/dbconfig/20220721-051752-root.json
* 05:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T313398|T313398]]
* 05:14 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T313398|T313398]]
* 05:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 [[phab:T313383|T313383]]
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1136 with weight 0 [[phab:T313383|T313383]]', diff saved to https://phabricator.wikimedia.org/P31559 and previous config saved to /var/cache/conftool/dbconfig/20220721-051358-root.json
* 05:13 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 [[phab:T313383|T313383]]
* 00:44 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
 
== 2022-07-20 ==
* 23:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS bullseye
* 23:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS bullseye
* 23:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS bullseye
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS bullseye
* 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS bullseye
* 23:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
* 23:29 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
* 23:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
* 23:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
* 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
* 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
* 23:11 ryankemper: [[phab:T300943|T300943]] Fixed IPMI passwords for elastic `20[67,68,70,71,72]`, reimaging them to bullseye (these hosts are not in service, thus the batch operation)
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS bullseye
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS bullseye
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS bullseye
* 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS bullseye
* 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS bullseye
* 21:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 21:45 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 20:45 cjming: end of UTC late backport window
* 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814906{{!}}Deploy the new grid layout to group 1 (T312241)]] (duration: 03m 16s)
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814906{{!}}Deploy the new grid layout to group 1 (T312241)]] (duration: 03m 14s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2032.codfw.wmnet with OS bullseye
* 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815359{{!}}Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670)]] (duration: 03m 26s)
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31555 and previous config saved to /var/cache/conftool/dbconfig/20220720-201240-marostegui.json
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815359{{!}}Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670)]] (duration: 03m 10s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
* 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31554 and previous config saved to /var/cache/conftool/dbconfig/20220720-195734-marostegui.json
* 19:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2032.codfw.wmnet with OS bullseye
* 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 02m 53s)
* 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31553 and previous config saved to /var/cache/conftool/dbconfig/20220720-194229-marostegui.json
* 19:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:33 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/3D/src/PatentFormField.php: Backport: [[gerrit:815733{{!}}PatentFormField: pass on $this->mParent to HTMLRadioField constructor (T313432)]] (duration: 03m 08s)
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31552 and previous config saved to /var/cache/conftool/dbconfig/20220720-192724-marostegui.json
* 19:17 jeena: that should be revert group1 wikis to 1.39.0-wmf.19
* 19:13 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group[0{{!}}1] wikis to [VERSION]"
* 18:37 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:35 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2045.codfw.wmnet with OS bullseye
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31551 and previous config saved to /var/cache/conftool/dbconfig/20220720-182710-marostegui.json
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 18:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 18:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 15 hosts with reason: Maintenance
* 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 15 hosts with reason: Maintenance
* 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31550 and previous config saved to /var/cache/conftool/dbconfig/20220720-182339-marostegui.json
* 18:17 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 03m 07s)
* 18:15 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31549 and previous config saved to /var/cache/conftool/dbconfig/20220720-180834-marostegui.json
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31548 and previous config saved to /var/cache/conftool/dbconfig/20220720-175328-marostegui.json
* 17:51 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 17:50 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2048.codfw.wmnet with OS bullseye
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31547 and previous config saved to /var/cache/conftool/dbconfig/20220720-173823-marostegui.json
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31546 and previous config saved to /var/cache/conftool/dbconfig/20220720-173522-marostegui.json
* 17:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 17:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31545 and previous config saved to /var/cache/conftool/dbconfig/20220720-173502-marostegui.json
* 17:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
* 17:25 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31544 and previous config saved to /var/cache/conftool/dbconfig/20220720-171956-marostegui.json
* 17:12 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'enable-puppet 815759'
* 17:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2048.codfw.wmnet with OS bullseye
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31543 and previous config saved to /var/cache/conftool/dbconfig/20220720-170451-marostegui.json
* 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31542 and previous config saved to /var/cache/conftool/dbconfig/20220720-164946-marostegui.json
* 16:49 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'disable-puppet 815759'
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31541 and previous config saved to /var/cache/conftool/dbconfig/20220720-164638-marostegui.json
* 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 16:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31540 and previous config saved to /var/cache/conftool/dbconfig/20220720-164618-marostegui.json
* 16:40 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31539 and previous config saved to /var/cache/conftool/dbconfig/20220720-163113-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31538 and previous config saved to /var/cache/conftool/dbconfig/20220720-161608-marostegui.json
* 16:05 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31537 and previous config saved to /var/cache/conftool/dbconfig/20220720-160103-marostegui.json
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31536 and previous config saved to /var/cache/conftool/dbconfig/20220720-155752-marostegui.json
* 15:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31535 and previous config saved to /var/cache/conftool/dbconfig/20220720-155732-marostegui.json
* 15:57 dancy@deploy1002: Installation of scap version "4.11.2" completed for 557 hosts
* 15:56 dancy@deploy1002: Installing scap version "4.11.2" for 557 hosts
* 15:50 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
* 15:46 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2036.codfw.wmnet with OS bullseye
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31534 and previous config saved to /var/cache/conftool/dbconfig/20220720-154227-marostegui.json
* 15:39 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing
* 15:35 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
* 15:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31532 and previous config saved to /var/cache/conftool/dbconfig/20220720-152721-marostegui.json
* 15:26 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 15:26 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 15:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
* 15:20 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db2167:3318', diff saved to https://phabricator.wikimedia.org/P31531 and previous config saved to /var/cache/conftool/dbconfig/20220720-151711-marostegui.json
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31530 and previous config saved to /var/cache/conftool/dbconfig/20220720-151216-marostegui.json
* 15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31529 and previous config saved to /var/cache/conftool/dbconfig/20220720-150908-marostegui.json
* 15:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31528 and previous config saved to /var/cache/conftool/dbconfig/20220720-150730-marostegui.json
* 15:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2036.codfw.wmnet with OS bullseye
* 14:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31527 and previous config saved to /var/cache/conftool/dbconfig/20220720-145224-marostegui.json
* 14:44 volans: installing spicearck 3.1.0 on cumin2002
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31524 and previous config saved to /var/cache/conftool/dbconfig/20220720-143719-marostegui.json
* 14:36 volans: uploaded spicerack_3.1.0 to apt.wikimedia.org bullseye-wikimedia
* 14:26 moritzm: installing containerd security updates in Kubernetes codfw masters
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31523 and previous config saved to /var/cache/conftool/dbconfig/20220720-142214-marostegui.json
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31522 and previous config saved to /var/cache/conftool/dbconfig/20220720-141912-marostegui.json
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31521 and previous config saved to /var/cache/conftool/dbconfig/20220720-141851-marostegui.json
* 14:04 Lucas_WMDE: UTC afternoon backport+config window done
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31520 and previous config saved to /var/cache/conftool/dbconfig/20220720-140346-marostegui.json
* 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: [[gerrit:815726{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (2/2) (duration: 03m 02s)
* 14:02 jbond: disable puppet on A:cp to deplot Gerrit:768766
* 13:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: [[gerrit:815726{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (1/2) (duration: 02m 56s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:54 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ git -C php-1.39.0-wmf.19/extensions/WikibaseLexeme am --skip # [[phab:T308659|T308659]] backport already applied
* 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2034.codfw.wmnet with OS bullseye
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31519 and previous config saved to /var/cache/conftool/dbconfig/20220720-134841-marostegui.json
* 13:45 moritzm: installing containerd security updates in Kubernetes codfw cluster
* 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: [[gerrit:815425{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (2/2) (duration: 03m 08s)
* 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: [[gerrit:815425{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (1/2) (duration: 03m 34s)
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 moritzm: installing request-tracker4 security updates
* 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 - [[phab:T313337|T313337]]
* 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 -
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31518 and previous config saved to /var/cache/conftool/dbconfig/20220720-133336-marostegui.json
* 13:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31517 and previous config saved to /var/cache/conftool/dbconfig/20220720-133030-marostegui.json
* 13:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31516 and previous config saved to /var/cache/conftool/dbconfig/20220720-133010-marostegui.json
* 13:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
* 13:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2034.codfw.wmnet with OS bullseye
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31515 and previous config saved to /var/cache/conftool/dbconfig/20220720-131505-marostegui.json
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31514 and previous config saved to /var/cache/conftool/dbconfig/20220720-130000-marostegui.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31513 and previous config saved to /var/cache/conftool/dbconfig/20220720-124453-marostegui.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31512 and previous config saved to /var/cache/conftool/dbconfig/20220720-124042-marostegui.json
* 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31511 and previous config saved to /var/cache/conftool/dbconfig/20220720-123751-marostegui.json
* 12:29 marostegui: Move pc1014 from pc2 to pc3 [[phab:T313401|T313401]]
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31510 and previous config saved to /var/cache/conftool/dbconfig/20220720-122246-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31509 and previous config saved to /var/cache/conftool/dbconfig/20220720-120738-marostegui.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31507 and previous config saved to /var/cache/conftool/dbconfig/20220720-115233-marostegui.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31506 and previous config saved to /var/cache/conftool/dbconfig/20220720-113424-marostegui.json
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 11:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
* 11:05 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
* 11:03 moritzm: draining ganeti2014 [[phab:T310483|T310483]]
* 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2020.codfw.wmnet with OS bullseye
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 12 hosts with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 12 hosts with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31504 and previous config saved to /var/cache/conftool/dbconfig/20220720-103825-marostegui.json
* 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
* 10:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31503 and previous config saved to /var/cache/conftool/dbconfig/20220720-102320-marostegui.json
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2020.codfw.wmnet with OS bullseye
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31502 and previous config saved to /var/cache/conftool/dbconfig/20220720-100815-marostegui.json
* 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2029.codfw.wmnet to cluster codfw and group A
* 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to cluster codfw and group A
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31501 and previous config saved to /var/cache/conftool/dbconfig/20220720-095310-marostegui.json
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31499 and previous config saved to /var/cache/conftool/dbconfig/20220720-085256-marostegui.json
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31498 and previous config saved to /var/cache/conftool/dbconfig/20220720-085236-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31497 and previous config saved to /var/cache/conftool/dbconfig/20220720-083731-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31496 and previous config saved to /var/cache/conftool/dbconfig/20220720-082226-marostegui.json
* 08:14 elukey: apt-get clean on archiva1002 to free some space
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31495 and previous config saved to /var/cache/conftool/dbconfig/20220720-080721-marostegui.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31494 and previous config saved to /var/cache/conftool/dbconfig/20220720-080509-marostegui.json
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31493 and previous config saved to /var/cache/conftool/dbconfig/20220720-080442-marostegui.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31492 and previous config saved to /var/cache/conftool/dbconfig/20220720-074937-marostegui.json
* 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
* 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
* 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31491 and previous config saved to /var/cache/conftool/dbconfig/20220720-073432-marostegui.json
* 07:31 jayme: ml-serve1002.eqiad.wmnet,ml-serve1004.eqiad.wmnet 'systemctl restart rsyslog'
* 07:30 taavi@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/SecurePoll/cli/wm-scripts/bv2022/populateEditCount.php: [[phab:T309753|T309753]] backports (duration: 02m 54s)
* 07:30 jayme: kubernetes1010.eqiad.wmnet,kubernetes1020.eqiad.wmnet 'systemctl restart rsyslog'
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/bv2022/: [[phab:T309753|T309753]] backports (duration: 02m 57s)
* 07:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31490 and previous config saved to /var/cache/conftool/dbconfig/20220720-071927-marostegui.json
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS bullseye
* 07:14 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815251{{!}}Enable ContentTranslation out of Beta for sswiki (T309384)]] (duration: 03m 24s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31489 and previous config saved to /var/cache/conftool/dbconfig/20220720-071114-marostegui.json
* 07:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31488 and previous config saved to /var/cache/conftool/dbconfig/20220720-071054-marostegui.json
* 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
* 06:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31487 and previous config saved to /var/cache/conftool/dbconfig/20220720-065549-marostegui.json
* 06:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS bullseye
* 06:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31486 and previous config saved to /var/cache/conftool/dbconfig/20220720-064044-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31485 and previous config saved to /var/cache/conftool/dbconfig/20220720-062539-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31484 and previous config saved to /var/cache/conftool/dbconfig/20220720-062327-marostegui.json
* 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31483 and previous config saved to /var/cache/conftool/dbconfig/20220720-062307-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31482 and previous config saved to /var/cache/conftool/dbconfig/20220720-060802-marostegui.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31481 and previous config saved to /var/cache/conftool/dbconfig/20220720-055256-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31480 and previous config saved to /var/cache/conftool/dbconfig/20220720-053751-marostegui.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31479 and previous config saved to /var/cache/conftool/dbconfig/20220720-053620-marostegui.json
* 05:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31478 and previous config saved to /var/cache/conftool/dbconfig/20220720-053520-marostegui.json
* 05:26 marostegui: Stop mysql on db2087 (s6 and s7) to clone db2169 [[phab:T311493|T311493]]
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31475 and previous config saved to /var/cache/conftool/dbconfig/20220720-052014-marostegui.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31474 and previous config saved to /var/cache/conftool/dbconfig/20220720-050509-marostegui.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2168 to dbctl in s7 and s8 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31473 and previous config saved to /var/cache/conftool/dbconfig/20220720-045918-marostegui.json
* 04:57 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31472 and previous config saved to /var/cache/conftool/dbconfig/20220720-045004-marostegui.json
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31471 and previous config saved to /var/cache/conftool/dbconfig/20220720-044729-marostegui.json
* 04:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 04:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 04:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 04:10 rzl: rzl@kubemaster1001:~$ sudo systemctl restart kube-apiserver
* 04:08 rzl: rzl@kubemaster1002:~$ sudo systemctl restart kube-apiserver
* 03:48 rzl: rzl@cumin2002:~$ sudo cumin dbproxy[1019,1020,1021].eqiad.wmnet 'systemctl reload haproxy'
* 03:37 rzl: rzl@dbproxy1018:~$ sudo systemctl reload haproxy
* 03:30 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 03:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host elastic2060.codfw.wmnet with OS bullseye
* 03:19 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
* 03:10 tstarling@deploy1002: Finished scap: revert yue -> zh fallback, needs LC rebuild in both branches [[phab:T296188|T296188]] (duration: 19m 41s)
* 02:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:51 tstarling@deploy1002: Started scap: revert yue -> zh fallback, needs LC rebuild in both branches [[phab:T296188|T296188]]
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2052.codfw.wmnet with OS bullseye
* 01:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
* 01:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
* 01:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2052.codfw.wmnet with OS bullseye
* 01:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS bullseye
* 00:43 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
* 00:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
* 00:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye
 
== 2022-07-19 ==
* 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 8 hosts with reason: Maintenance
* 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 8 hosts with reason: Maintenance
* 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31470 and previous config saved to /var/cache/conftool/dbconfig/20220719-225828-marostegui.json
* 22:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2050.codfw.wmnet with OS bullseye
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31469 and previous config saved to /var/cache/conftool/dbconfig/20220719-224323-marostegui.json
* 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
* 22:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
* 22:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31468 and previous config saved to /var/cache/conftool/dbconfig/20220719-222818-marostegui.json
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31467 and previous config saved to /var/cache/conftool/dbconfig/20220719-221312-marostegui.json
* 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31466 and previous config saved to /var/cache/conftool/dbconfig/20220719-221035-marostegui.json
* 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2050.codfw.wmnet with OS bullseye
* 22:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31465 and previous config saved to /var/cache/conftool/dbconfig/20220719-220946-marostegui.json
* 22:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31464 and previous config saved to /var/cache/conftool/dbconfig/20220719-215441-marostegui.json
* 21:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 21:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31463 and previous config saved to /var/cache/conftool/dbconfig/20220719-213936-marostegui.json
* 21:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2026.codfw.wmnet with OS bullseye
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:36 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 04m 02s)
* 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 21:26 dancy@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Config: [[gerrit:815317{{!}}MWConfigCacheGenerator: If opcache.revalidate_freq is 0, use grace period of 10 seconds (T311788)]] (duration: 02m 59s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31462 and previous config saved to /var/cache/conftool/dbconfig/20220719-212431-marostegui.json
* 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31461 and previous config saved to /var/cache/conftool/dbconfig/20220719-212149-marostegui.json
* 21:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31460 and previous config saved to /var/cache/conftool/dbconfig/20220719-212128-marostegui.json
* 21:17 jforrester@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Scribunto/includes/Hooks.php: Train unblocker: [[gerrit:815281{{!}}Hooks: Bump scribunto-stats cache version (T313341)]] (duration: 03m 14s)
* 21:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
* 21:14 cjming: end of UTC late backport window
* 21:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815374{{!}}uzwiki: Create "eliminator" group (T302670)]] (duration: 03m 13s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815374{{!}}uzwiki: Create "eliminator" group (T302670)]] (duration: 03m 19s)
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31459 and previous config saved to /var/cache/conftool/dbconfig/20220719-210623-marostegui.json
* 21:06 brennen@deploy1002: brennen and ebernhardson: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:05 brennen@deploy1002: Started scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 21:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2026.codfw.wmnet with OS bullseye
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:00 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:776334{{!}}Add "uploader" user group for kswiki. (T305320)]] (duration: 02m 58s)
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:59 ryankemper: [[phab:T313431|T313431]] Repooled `elastic[2073-2074,2080-2081,2083,2086].codfw.wmnet`. Codfw's all on 5 masters now and cluster is back to green.
* 20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:774841{{!}}Add file mover user group for azwiki (T304968)]] (duration: 02m 52s)
* 20:58 brennen@deploy1002: Sync cancelled.
* 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31458 and previous config saved to /var/cache/conftool/dbconfig/20220719-205118-marostegui.json
* 20:58 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:58 ryankemper: [[phab:T313431|T313431]] Updated cross-cluster seed conf with new masters; should resolve the settings check alerts
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:58 brennen@deploy1002: Started scap: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]]
* 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:774841{{!}}Add file mover user group for azwiki (T304968)]] (duration: 03m 15s)
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4027.ulsfo.wmnet
* 20:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2055.codfw.wmnet with OS bullseye
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:815360{{!}} Bumping portals to master (T128546)]] (duration: 02m 53s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31457 and previous config saved to /var/cache/conftool/dbconfig/20220719-203613-marostegui.json
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31456 and previous config saved to /var/cache/conftool/dbconfig/20220719-203327-marostegui.json
* 20:52 brennen@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.3" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.gcoIZ0BTKW"' returned non-zero exit status 255. (duration: 00m 00s)
* 20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:815360{{!}} Bumping portals to master (T128546)]] (duration: 03m 09s)
* 20:52 brennen@deploy1002: Started scap: Backport for [[gerrit:836886{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:49 robh@cumin2002: START - Cookbook sre.dns.netbox
* 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31455 and previous config saved to /var/cache/conftool/dbconfig/20220719-203307-marostegui.json
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:29 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815338{{!}}[wmf-config]: Undeploy GDI Survey Wave 2 (T312866)]] (duration: 03m 12s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 brennen@deploy1002: Sync cancelled.
* 20:45 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:45 brennen@deploy1002: Started scap: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]]
* 20:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 20:42 brennen@deploy1002: Sync cancelled.
* 20:41 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:41 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2080` to pick up new master-eligible status
* 20:41 brennen@deploy1002: Started scap: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]]
* 20:38 brennen@deploy1002: Finished scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] (duration: 08m 03s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:35 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 20:35 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cp4027.ulsfo.wmnet
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 20:30 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:30 brennen@deploy1002: Started scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]]
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] (duration: 08m 05s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:19 hoo: Ran foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php
* 20:17 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}} (with cache prefix altered for moved classes)
* 20:17 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2086` to pick up new master-eligible status
* 20:17 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:17 brennen@deploy1002: Started scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]]
* 20:04 ejegg: payments-wiki rolled back from {{Gerrit|839d6dde}} to {{Gerrit|0456850e}}
* 19:56 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}}
* 19:55 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic208[1,3]` to pick up new master-eligible status
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 19:33 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic207[3,4]` to pick up new master-eligible status
* 19:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 19:29 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4021.ulsfo.wmnet
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:04 robh@cumin2002: START - Cookbook sre.dns.netbox
* 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4021.ulsfo.wmnet
* 18:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:06 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35188 and previous config saved to /var/cache/conftool/dbconfig/20220929-162812-ladsgroup.json
* 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35187 and previous config saved to /var/cache/conftool/dbconfig/20220929-162750-ladsgroup.json
* 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35186 and previous config saved to /var/cache/conftool/dbconfig/20220929-161244-ladsgroup.json
* 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35185 and previous config saved to /var/cache/conftool/dbconfig/20220929-155737-ladsgroup.json
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:836858{{!}}Configure `mul` Wikibase language code on Beta wikis]] (beta-only, prod noop) (duration: 03m 41s)
* 15:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35184 and previous config saved to /var/cache/conftool/dbconfig/20220929-154231-ladsgroup.json
* 15:35 dancy@deploy1002: Installation of scap version "4.25.0" completed for 561 hosts
* 15:35 dancy@deploy1002: Installing scap version "4.25.0" for 561 hosts
* 14:30 moritzm: installing glib2.0 security updates
* 14:29 moritzm: uploaded glib2.0 2.50.3-2+deb9u3+wmf1  to apt.wikimedia.org/stretch-wikimedia
* 14:17 moritzm: rolling restart of apache2 in mw/eqiad to pick up Expat security updates
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11164
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11164
* 13:54 claime: Enabled puppet for C:memcache hosts following merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 32934
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35179 and previous config saved to /var/cache/conftool/dbconfig/20220929-134844-root.json
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:46 claime: Disabling puppet for C:memcache hosts to merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 13:41 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] (duration: 06m 13s)
* 13:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
* 13:39 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
* 13:35 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and hoo: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:34 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]]
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35178 and previous config saved to /var/cache/conftool/dbconfig/20220929-133339-root.json
* 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] (duration: 05m 36s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kemayo: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]]
* 13:26 moritzm: restartting Apache on lists
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] (duration: 05m 23s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35176 and previous config saved to /var/cache/conftool/dbconfig/20220929-131834-root.json
* 13:15 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:15 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]]
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35175 and previous config saved to /var/cache/conftool/dbconfig/20220929-131507-root.json
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 moritzm: rolling restart of apache2 in mw/codfw to pick up Expat security updates
* 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835291{{!}}votewiki: Change wgLanguageCode to zh for Sep 2022 admins election (T318147)]] (duration: 03m 40s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35174 and previous config saved to /var/cache/conftool/dbconfig/20220929-130329-root.json
* 13:01 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 04m 04s)
* 13:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35173 and previous config saved to /var/cache/conftool/dbconfig/20220929-130003-root.json
* 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:57 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35172 and previous config saved to /var/cache/conftool/dbconfig/20220929-124824-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35171 and previous config saved to /var/cache/conftool/dbconfig/20220929-124458-root.json
* 12:44 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] (duration: 09m 05s)
* 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:35 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 12:34 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35169 and previous config saved to /var/cache/conftool/dbconfig/20220929-123319-root.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35168 and previous config saved to /var/cache/conftool/dbconfig/20220929-122953-root.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35167 and previous config saved to /var/cache/conftool/dbconfig/20220929-121814-root.json
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35166 and previous config saved to /var/cache/conftool/dbconfig/20220929-121448-root.json
* 12:10 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 12:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3292
* 12:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3292
* 12:04 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35165 and previous config saved to /var/cache/conftool/dbconfig/20220929-120309-root.json
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35164 and previous config saved to /var/cache/conftool/dbconfig/20220929-115943-root.json
* 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
* 11:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P35163 and previous config saved to /var/cache/conftool/dbconfig/20220929-115612-root.json
* 11:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 209453
* 11:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 209453
* 11:51 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15695
* 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15695
* 11:45 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 42
* 11:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
* 11:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3856
* 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35162 and previous config saved to /var/cache/conftool/dbconfig/20220929-114438-root.json
* 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35161 and previous config saved to /var/cache/conftool/dbconfig/20220929-114431-ladsgroup.json
* 11:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3856
* 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42
* 11:41 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 11:40 ladsgroup@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
* 11:39 ladsgroup@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 11:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62955
* 11:38 ladsgroup@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:38 ladsgroup@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 11:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62955
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35160 and previous config saved to /var/cache/conftool/dbconfig/20220929-112933-root.json
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35159 and previous config saved to /var/cache/conftool/dbconfig/20220929-112925-ladsgroup.json
* 11:16 XioNoX: re-pool cr2-eqord - [[phab:T295690|T295690]]
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35158 and previous config saved to /var/cache/conftool/dbconfig/20220929-111418-ladsgroup.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35157 and previous config saved to /var/cache/conftool/dbconfig/20220929-111217-root.json
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 codfw primary [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35156 and previous config saved to /var/cache/conftool/dbconfig/20220929-111127-root.json
* 11:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - [[phab:T318892|T318892]]
* 11:06 XioNoX: restart cr2-eqord for upgrade - [[phab:T295690|T295690]]
* 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
* 11:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
* 11:01 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35155 and previous config saved to /var/cache/conftool/dbconfig/20220929-105912-ladsgroup.json
* 10:53 XioNoX: drain cr2-eqord - [[phab:T295690|T295690]]
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35154 and previous config saved to /var/cache/conftool/dbconfig/20220929-105206-root.json
* 10:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s8 [[phab:T318892|T318892]]
* 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s8 [[phab:T318892|T318892]]
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318892|T318892]]
* 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
* 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318892|T318892]]
* 10:50 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
* 10:40 XioNoX: repool cr2-eqiad - [[phab:T295690|T295690]]
* 10:36 moritzm: installing poppler security updates
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35153 and previous config saved to /var/cache/conftool/dbconfig/20220929-100849-ladsgroup.json
* 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35152 and previous config saved to /var/cache/conftool/dbconfig/20220929-100828-ladsgroup.json
* 10:07 XioNoX: second (and longest) cr2-eqiad RE switchover - [[phab:T295690|T295690]]
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35150 and previous config saved to /var/cache/conftool/dbconfig/20220929-095321-ladsgroup.json
* 09:45 moritzm: restarting superset to pick up expat security update
* 09:43 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 09:42 XioNoX: first cr2-eqiad RE switchover - [[phab:T295690|T295690]]
* 09:41 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 09:38 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35149 and previous config saved to /var/cache/conftool/dbconfig/20220929-093815-ladsgroup.json
* 09:36 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 09:34 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:33 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:33 XioNoX: drain cr2-eqiad - [[phab:T295690|T295690]]
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
* 09:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
* 09:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:26 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2098.codfw.wmnet with OS bullseye
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35148 and previous config saved to /var/cache/conftool/dbconfig/20220929-092308-ladsgroup.json
* 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:16 XioNoX: repool cr1-eqiad - [[phab:T295690|T295690]]
* 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.3"
* 09:07 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
* 09:04 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
* 08:52 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2098.codfw.wmnet with OS bullseye
* 08:43 XioNoX: second cr1-eqiad RE switchover - [[phab:T295690|T295690]]
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35146 and previous config saved to /var/cache/conftool/dbconfig/20220929-082757-root.json
* 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:15 XioNoX: first cr1-eqiad RE switchover (for NVM firmware) - [[phab:T295690|T295690]]
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35145 and previous config saved to /var/cache/conftool/dbconfig/20220929-081252-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35144 and previous config saved to /var/cache/conftool/dbconfig/20220929-080340-root.json
* 07:57 XioNoX: drain traffic away from cr1-eqiad - [[phab:T295690|T295690]]
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35143 and previous config saved to /var/cache/conftool/dbconfig/20220929-075747-root.json
* 07:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
* 07:49 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35142 and previous config saved to /var/cache/conftool/dbconfig/20220929-074835-root.json
* 07:45 moritzm: installing expat security updates
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35141 and previous config saved to /var/cache/conftool/dbconfig/20220929-074242-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 18106
* 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 18106
* 07:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38040
* 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38040
* 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35280
* 07:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35280
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35140 and previous config saved to /var/cache/conftool/dbconfig/20220929-073330-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35139 and previous config saved to /var/cache/conftool/dbconfig/20220929-072745-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35138 and previous config saved to /var/cache/conftool/dbconfig/20220929-072737-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35137 and previous config saved to /var/cache/conftool/dbconfig/20220929-071825-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35136 and previous config saved to /var/cache/conftool/dbconfig/20220929-071240-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35135 and previous config saved to /var/cache/conftool/dbconfig/20220929-071232-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35134 and previous config saved to /var/cache/conftool/dbconfig/20220929-070320-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35133 and previous config saved to /var/cache/conftool/dbconfig/20220929-065736-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35132 and previous config saved to /var/cache/conftool/dbconfig/20220929-065727-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35131 and previous config saved to /var/cache/conftool/dbconfig/20220929-064815-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35130 and previous config saved to /var/cache/conftool/dbconfig/20220929-064231-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35129 and previous config saved to /var/cache/conftool/dbconfig/20220929-064222-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P35128 and previous config saved to /var/cache/conftool/dbconfig/20220929-063508-root.json
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35127 and previous config saved to /var/cache/conftool/dbconfig/20220929-063310-root.json
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35126 and previous config saved to /var/cache/conftool/dbconfig/20220929-062726-root.json
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35125 and previous config saved to /var/cache/conftool/dbconfig/20220929-061805-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35124 and previous config saved to /var/cache/conftool/dbconfig/20220929-061221-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35123 and previous config saved to /var/cache/conftool/dbconfig/20220929-060532-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary and set section read-write [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35122 and previous config saved to /var/cache/conftool/dbconfig/20220929-060425-root.json
* 06:03 marostegui: Starting s7 codfw failover from db2121 to db2118 - [[phab:T318888|T318888]]
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35121 and previous config saved to /var/cache/conftool/dbconfig/20220929-055716-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2118 from API [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35120 and previous config saved to /var/cache/conftool/dbconfig/20220929-054542-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35119 and previous config saved to /var/cache/conftool/dbconfig/20220929-054509-root.json
* 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318888|T318888]]
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318888|T318888]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35118 and previous config saved to /var/cache/conftool/dbconfig/20220929-054211-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 from API [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35117 and previous config saved to /var/cache/conftool/dbconfig/20220929-053951-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35116 and previous config saved to /var/cache/conftool/dbconfig/20220929-053407-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35115 and previous config saved to /var/cache/conftool/dbconfig/20220929-053302-root.json
* 05:32 marostegui: Starting s4 codfw failover from db2110 to db2140 - [[phab:T318886|T318886]]
* 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35114 and previous config saved to /var/cache/conftool/dbconfig/20220929-052805-ladsgroup.json
* 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 05:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35113 and previous config saved to /var/cache/conftool/dbconfig/20220929-052743-ladsgroup.json
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35112 and previous config saved to /var/cache/conftool/dbconfig/20220929-051237-ladsgroup.json
* 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T318886|T318886]]
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35111 and previous config saved to /var/cache/conftool/dbconfig/20220929-051114-root.json
* 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T318886|T318886]]
* 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35110 and previous config saved to /var/cache/conftool/dbconfig/20220929-045730-ladsgroup.json
* 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35109 and previous config saved to /var/cache/conftool/dbconfig/20220929-044224-ladsgroup.json
* 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35108 and previous config saved to /var/cache/conftool/dbconfig/20220929-035724-ladsgroup.json
* 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35107 and previous config saved to /var/cache/conftool/dbconfig/20220929-035647-ladsgroup.json
* 03:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35106 and previous config saved to /var/cache/conftool/dbconfig/20220929-034140-ladsgroup.json
* 03:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 03:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35105 and previous config saved to /var/cache/conftool/dbconfig/20220929-032634-ladsgroup.json
* 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35104 and previous config saved to /var/cache/conftool/dbconfig/20220929-031127-ladsgroup.json
* 02:29 ejegg: updated fundraising CiviCRM from {{Gerrit|f3461a44}} to {{Gerrit|5e1738a1}}
* 02:20 ejegg: updated fundraising python tools from {{Gerrit|dd494413}} to {{Gerrit|14d60435}}
* 01:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS buster
* 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
* 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
== 2022-09-28 ==
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
* 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
* 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:20 ejegg: updated fundraising CiviCRM from {{Gerrit|d31c19a0}} to {{Gerrit|f3461a44}}
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
* 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35100 and previous config saved to /var/cache/conftool/dbconfig/20220928-212131-ladsgroup.json
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35099 and previous config saved to /var/cache/conftool/dbconfig/20220928-210624-ladsgroup.json
* 21:06 volans: installed spicerack 4.0.0-1+deb11u1 on cumin1001
* 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35098 and previous config saved to /var/cache/conftool/dbconfig/20220928-205117-ladsgroup.json
* 20:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
* 20:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
* 20:39 TheresNoTime: closing UTC late backport window
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] (duration: 06m 19s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:18 samtar@deploy1002: samtar and essexigyan: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:18 samtar@deploy1002: Started scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]]
* 20:11 samtar@deploy1002: Sync cancelled.
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 20:04 samtar@deploy1002: samtar and dani: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]]
* 19:24 ejegg: updated fundraising CiviCRM from {{Gerrit|916a8b08}} to {{Gerrit|d31c19a0}}
* 19:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 18:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:22 volans: installed spicerack 4.0.0-1+deb11u1 on cumin2002
* 18:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3f23a1b]: (no justification provided) (duration: 00m 11s)
* 18:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@3f23a1b]: (no justification provided)
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 03m 38s)
* 18:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19653
* 17:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19653
* 17:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
* 17:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4181
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4181
* 17:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35097 and previous config saved to /var/cache/conftool/dbconfig/20220928-171848-ladsgroup.json
* 17:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35096 and previous config saved to /var/cache/conftool/dbconfig/20220928-170342-ladsgroup.json
* 16:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
* 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35095 and previous config saved to /var/cache/conftool/dbconfig/20220928-164835-ladsgroup.json
* 16:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
* 16:36 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@f89d689]: (no justification provided) (duration: 00m 12s)
* 16:36 nokafor@deploy1002: Started deploy [airflow-dags/analytics@f89d689]: (no justification provided)
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
* 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35093 and previous config saved to /var/cache/conftool/dbconfig/20220928-163329-ladsgroup.json
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:26 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4775
* 16:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4775
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2635
* 16:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2635
* 16:15 volans: uploaded spicerack_4.0.0 to apt.wikimedia.org bullseye-wikimedia
* 15:57 dancy@deploy1002: Installation of scap version "4.24.0" completed for 561 hosts
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:57 dancy@deploy1002: Installing scap version "4.24.0" for 561 hosts
* 15:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40217
* 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40217
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
* 15:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
* 15:51 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0646be1]: (no justification provided) (duration: 00m 10s)
* 15:51 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0646be1]: (no justification provided)
* 15:47 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS buster
* 15:26 moritzm: installing libgoogle-gson-java security updates on bullseye
* 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4922
* 15:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4922
* 15:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
* 15:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
* 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19108
* 15:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19108
* 15:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:09 moritzm: installing twisted security updates
* 15:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8674
* 15:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8674
* 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35092 and previous config saved to /var/cache/conftool/dbconfig/20220928-150230-ladsgroup.json
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35091 and previous config saved to /var/cache/conftool/dbconfig/20220928-150158-ladsgroup.json
* 15:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:00 SandraEbele: deploying Airflow for hdfsarchiver operator fix
* 15:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@aa7984f]: (no justification provided) (duration: 00m 14s)
* 15:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@aa7984f]: (no justification provided)
* 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1005.eqiad.wmnet with OS bullseye
* 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
* 14:53 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
* 14:48 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
* 14:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53334
* 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35090 and previous config saved to /var/cache/conftool/dbconfig/20220928-144651-ladsgroup.json
* 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 53334
* 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46450
* 14:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46450
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
* 14:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22987
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35089 and previous config saved to /var/cache/conftool/dbconfig/20220928-143145-ladsgroup.json
* 14:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22987
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21949
* 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1005.eqiad.wmnet with OS bullseye
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21949
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19108
* 14:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19108
* 14:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15695
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15695
* 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13335
* 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35088 and previous config saved to /var/cache/conftool/dbconfig/20220928-141638-ladsgroup.json
* 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13335
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
* 14:12 volans: added python3-gjson v0.0.5 to apt.w.o (bullseye only)
* 14:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10310
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
* 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 14:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8781
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35087 and previous config saved to /var/cache/conftool/dbconfig/20220928-141007-root.json
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35086 and previous config saved to /var/cache/conftool/dbconfig/20220928-141001-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35085 and previous config saved to /var/cache/conftool/dbconfig/20220928-140956-root.json
* 14:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8781
* 14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35084 and previous config saved to /var/cache/conftool/dbconfig/20220928-140950-root.json
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
* 14:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8359
* 14:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudrabbit1003.wikimedia.org
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8359
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8075
* 14:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-eqiad
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8075
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7195
* 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7195
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6762
* 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6762
* 14:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6614
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6614
* 14:02 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6079
* 14:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6079
* 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4230
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4230
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35083 and previous config saved to /var/cache/conftool/dbconfig/20220928-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35082 and previous config saved to /var/cache/conftool/dbconfig/20220928-135456-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35081 and previous config saved to /var/cache/conftool/dbconfig/20220928-135451-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35080 and previous config saved to /var/cache/conftool/dbconfig/20220928-135445-root.json
* 13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
* 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3300
* 13:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:52 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 13:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3300
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
* 13:50 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3292
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2906
* 13:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 13:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2906
* 13:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2603
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2603
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
* 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35079 and previous config saved to /var/cache/conftool/dbconfig/20220928-133957-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35078 and previous config saved to /var/cache/conftool/dbconfig/20220928-133951-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35077 and previous config saved to /var/cache/conftool/dbconfig/20220928-133946-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35076 and previous config saved to /var/cache/conftool/dbconfig/20220928-133940-root.json
* 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=1) rolling restart_daemons on A:thanos-fe-codfw
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 577
* 13:32 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 577
* 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
* 13:31 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35075 and previous config saved to /var/cache/conftool/dbconfig/20220928-132452-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35074 and previous config saved to /var/cache/conftool/dbconfig/20220928-132446-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35073 and previous config saved to /var/cache/conftool/dbconfig/20220928-132442-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35072 and previous config saved to /var/cache/conftool/dbconfig/20220928-132435-root.json
* 13:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:15 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35071 and previous config saved to /var/cache/conftool/dbconfig/20220928-130947-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35070 and previous config saved to /var/cache/conftool/dbconfig/20220928-130941-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35069 and previous config saved to /var/cache/conftool/dbconfig/20220928-130937-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35068 and previous config saved to /var/cache/conftool/dbconfig/20220928-130930-root.json
* 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35067 and previous config saved to /var/cache/conftool/dbconfig/20220928-125442-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35066 and previous config saved to /var/cache/conftool/dbconfig/20220928-125436-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35065 and previous config saved to /var/cache/conftool/dbconfig/20220928-125432-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35064 and previous config saved to /var/cache/conftool/dbconfig/20220928-125425-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35063 and previous config saved to /var/cache/conftool/dbconfig/20220928-123937-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35062 and previous config saved to /var/cache/conftool/dbconfig/20220928-123932-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35061 and previous config saved to /var/cache/conftool/dbconfig/20220928-123927-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35060 and previous config saved to /var/cache/conftool/dbconfig/20220928-123920-root.json
* 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35058 and previous config saved to /var/cache/conftool/dbconfig/20220928-122432-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35057 and previous config saved to /var/cache/conftool/dbconfig/20220928-122427-root.json
* 12:24 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C thirdparty/elastic710 copy buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35056 and previous config saved to /var/cache/conftool/dbconfig/20220928-122422-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35055 and previous config saved to /var/cache/conftool/dbconfig/20220928-122421-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35054 and previous config saved to /var/cache/conftool/dbconfig/20220928-122415-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35053 and previous config saved to /var/cache/conftool/dbconfig/20220928-122414-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35052 and previous config saved to /var/cache/conftool/dbconfig/20220928-122411-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35051 and previous config saved to /var/cache/conftool/dbconfig/20220928-122403-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35050 and previous config saved to /var/cache/conftool/dbconfig/20220928-122356-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35049 and previous config saved to /var/cache/conftool/dbconfig/20220928-122350-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35048 and previous config saved to /var/cache/conftool/dbconfig/20220928-122346-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P35047 and previous config saved to /var/cache/conftool/dbconfig/20220928-122321-root.json
* 12:22 gehel: above reprepro copy failed, elastic710 component does not exist yet
* 12:21 XioNoX: re-enable Init7 in knams
* 12:21 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C elastic710 buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 db2146 db2122 es2022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P35046 and previous config saved to /var/cache/conftool/dbconfig/20220928-121912-root.json
* 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 12:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35045 and previous config saved to /var/cache/conftool/dbconfig/20220928-120916-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35044 and previous config saved to /var/cache/conftool/dbconfig/20220928-120909-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35043 and previous config saved to /var/cache/conftool/dbconfig/20220928-120906-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35042 and previous config saved to /var/cache/conftool/dbconfig/20220928-120858-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35041 and previous config saved to /var/cache/conftool/dbconfig/20220928-120852-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35040 and previous config saved to /var/cache/conftool/dbconfig/20220928-120845-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35039 and previous config saved to /var/cache/conftool/dbconfig/20220928-120841-root.json
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 11:58 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35038 and previous config saved to /var/cache/conftool/dbconfig/20220928-115411-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35037 and previous config saved to /var/cache/conftool/dbconfig/20220928-115404-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35036 and previous config saved to /var/cache/conftool/dbconfig/20220928-115401-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35035 and previous config saved to /var/cache/conftool/dbconfig/20220928-115354-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35034 and previous config saved to /var/cache/conftool/dbconfig/20220928-115347-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35033 and previous config saved to /var/cache/conftool/dbconfig/20220928-115340-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35032 and previous config saved to /var/cache/conftool/dbconfig/20220928-115336-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35031 and previous config saved to /var/cache/conftool/dbconfig/20220928-113906-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35030 and previous config saved to /var/cache/conftool/dbconfig/20220928-113900-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35029 and previous config saved to /var/cache/conftool/dbconfig/20220928-113856-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35028 and previous config saved to /var/cache/conftool/dbconfig/20220928-113849-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35027 and previous config saved to /var/cache/conftool/dbconfig/20220928-113842-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35026 and previous config saved to /var/cache/conftool/dbconfig/20220928-113835-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35025 and previous config saved to /var/cache/conftool/dbconfig/20220928-113831-root.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35024 and previous config saved to /var/cache/conftool/dbconfig/20220928-112401-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35023 and previous config saved to /var/cache/conftool/dbconfig/20220928-112355-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35022 and previous config saved to /var/cache/conftool/dbconfig/20220928-112351-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35021 and previous config saved to /var/cache/conftool/dbconfig/20220928-112344-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35020 and previous config saved to /var/cache/conftool/dbconfig/20220928-112337-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35019 and previous config saved to /var/cache/conftool/dbconfig/20220928-112330-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35018 and previous config saved to /var/cache/conftool/dbconfig/20220928-112326-root.json
* 11:18 moritzm: installing expat security updates
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35017 and previous config saved to /var/cache/conftool/dbconfig/20220928-110856-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35016 and previous config saved to /var/cache/conftool/dbconfig/20220928-110850-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35015 and previous config saved to /var/cache/conftool/dbconfig/20220928-110846-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35014 and previous config saved to /var/cache/conftool/dbconfig/20220928-110839-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35013 and previous config saved to /var/cache/conftool/dbconfig/20220928-110832-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35012 and previous config saved to /var/cache/conftool/dbconfig/20220928-110825-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35011 and previous config saved to /var/cache/conftool/dbconfig/20220928-110821-root.json
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35010 and previous config saved to /var/cache/conftool/dbconfig/20220928-105531-ladsgroup.json
* 10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35009 and previous config saved to /var/cache/conftool/dbconfig/20220928-105520-ladsgroup.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35008 and previous config saved to /var/cache/conftool/dbconfig/20220928-105351-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35007 and previous config saved to /var/cache/conftool/dbconfig/20220928-105345-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35006 and previous config saved to /var/cache/conftool/dbconfig/20220928-105340-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35005 and previous config saved to /var/cache/conftool/dbconfig/20220928-105332-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35004 and previous config saved to /var/cache/conftool/dbconfig/20220928-105327-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35003 and previous config saved to /var/cache/conftool/dbconfig/20220928-105320-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35002 and previous config saved to /var/cache/conftool/dbconfig/20220928-105315-root.json
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P35001 and previous config saved to /var/cache/conftool/dbconfig/20220928-104014-ladsgroup.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35000 and previous config saved to /var/cache/conftool/dbconfig/20220928-103847-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34999 and previous config saved to /var/cache/conftool/dbconfig/20220928-103840-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34998 and previous config saved to /var/cache/conftool/dbconfig/20220928-103835-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34997 and previous config saved to /var/cache/conftool/dbconfig/20220928-103827-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34996 and previous config saved to /var/cache/conftool/dbconfig/20220928-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34995 and previous config saved to /var/cache/conftool/dbconfig/20220928-103815-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34994 and previous config saved to /var/cache/conftool/dbconfig/20220928-103810-root.json
* 10:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 db1137 db1168 db1143 db1132 db1127 es1022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P34993 and previous config saved to /var/cache/conftool/dbconfig/20220928-102759-root.json
* 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P34992 and previous config saved to /var/cache/conftool/dbconfig/20220928-102508-ladsgroup.json
* 10:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 10:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34990 and previous config saved to /var/cache/conftool/dbconfig/20220928-101001-ladsgroup.json
* 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59689
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59689
* 08:49 jbond: disable puppet on cache serveres to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/832268
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34989 and previous config saved to /var/cache/conftool/dbconfig/20220928-084557-ladsgroup.json
* 08:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34988 and previous config saved to /var/cache/conftool/dbconfig/20220928-084535-ladsgroup.json
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34987 and previous config saved to /var/cache/conftool/dbconfig/20220928-083029-ladsgroup.json
* 08:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34985 and previous config saved to /var/cache/conftool/dbconfig/20220928-081522-ladsgroup.json
* 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34984 and previous config saved to /var/cache/conftool/dbconfig/20220928-080015-ladsgroup.json
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:30 XioNoX: disable BGP to init7 in knams
* 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] (duration: 05m 17s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:03 kartik@deploy1002: Started scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]]
* 06:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34981 and previous config saved to /var/cache/conftool/dbconfig/20220928-043052-ladsgroup.json
* 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34980 and previous config saved to /var/cache/conftool/dbconfig/20220928-043030-ladsgroup.json
* 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34979 and previous config saved to /var/cache/conftool/dbconfig/20220928-041524-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34978 and previous config saved to /var/cache/conftool/dbconfig/20220928-040017-ladsgroup.json
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34977 and previous config saved to /var/cache/conftool/dbconfig/20220928-034511-ladsgroup.json
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34976 and previous config saved to /var/cache/conftool/dbconfig/20220928-020746-ladsgroup.json
* 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34975 and previous config saved to /var/cache/conftool/dbconfig/20220928-020724-ladsgroup.json
* 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34974 and previous config saved to /var/cache/conftool/dbconfig/20220928-015218-ladsgroup.json
* 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34973 and previous config saved to /var/cache/conftool/dbconfig/20220928-013711-ladsgroup.json
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
* 01:18 ejegg: updated fundraising python tools from {{Gerrit|b65109af}} to {{Gerrit|dd494413}}
* 00:34 eileen: civicrm upgraded from {{Gerrit|118c1d0b}} to {{Gerrit|916a8b08}}
* 00:11 eileen: civicrm upgraded from {{Gerrit|e198fb4c}} to {{Gerrit|118c1d0b}}
== 2022-09-27 ==
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:12 TheresNoTime: closing UTC late backport window
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 20:59 TheresNoTime: extending UTC late backport window
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 20:23 cjming@deploy1002: Synchronized wmf-config: Config: [[gerrit:814869{{!}}Deploy the new grid layout to group 0 wikis (T312241)]] (duration: 03m 05s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31454 and previous config saved to /var/cache/conftool/dbconfig/20220719-201802-marostegui.json
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814908
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2022-07-18 ==
== 2022-09-26 ==
* 23:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1050.eqiad.wmnet
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 23:46 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1050.eqiad.wmnet
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 23:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1049.eqiad.wmnet
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 23:07 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1049.eqiad.wmnet
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 21:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 21:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 21:36 sbassett: Deployed security fix for [[phab:T309894|T309894]]
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 20:58 ebernhardson: start reindex of all wikis except commonswiki and wikidatawiki in eqiad and codfw cirrus clusters
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 20:45 urbanecm: UTC late B&C window finished
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:45 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/: {{Gerrit|930ecb76a5a9266d498f40b49ab5ff82c01dbcf5}}: reindex: Detect index type from live mappings (duration: 02m 55s)
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8d1663c93d2ddeb107d5f9b8982a7f4a7b880aba}}: Turn off fixed width in main namespace on Wikisource ( [[phab:T311607|T311607]]) (duration: 02m 41s)
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 TheresNoTime: closing UTC late backport window
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c258b25e8a47caf9d531f01798d32cd3f9b1605}}: Enable language switching button for logged-out users on non-pilot wikis ([[phab:T312861|T312861]]) (duration: 02m 43s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f99c5331380a8c03f4c447e2f73cb76afca337a2}}: Pin cu_log actor migration to old schema ([[phab:T233004|T233004]]) (duration: 02m 41s)
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|415c4ef44d9bf1abab6942fbbc552990a8e992c8}}: Collapse sidebar by default for anonymous users ([[phab:T287609|T287609]]) (duration: 02m 41s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/resources/src/moment/moment-locale-overrides.js: {{Gerrit|c4d8a217b4ce0a9f7aefaacc032136e7eb058d4d}}: Ensure custom locales for Moment.js overrides, dont change en ([[phab:T313188|T313188]]) (duration: 02m 44s)
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 20:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|76b7cd6379c25175570eeeb2a305de0fd0bc61e5}}: Mentorship: enable the Vue version of the dashboard in test ([[phab:T300532|T300532]]) (duration: 03m 00s)
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 19:04 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 19:02 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31385 and previous config saved to /var/cache/conftool/dbconfig/20220718-184146-root.json
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 18:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 18:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31384 and previous config saved to /var/cache/conftool/dbconfig/20220718-182642-root.json
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 18:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS bullseye
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31382 and previous config saved to /var/cache/conftool/dbconfig/20220718-181138-root.json
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 18:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 17:57 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31381 and previous config saved to /var/cache/conftool/dbconfig/20220718-175634-root.json
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS bullseye
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31380 and previous config saved to /var/cache/conftool/dbconfig/20220718-174130-root.json
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 17:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31379 and previous config saved to /var/cache/conftool/dbconfig/20220718-172626-root.json
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31378 and previous config saved to /var/cache/conftool/dbconfig/20220718-171122-root.json
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 16:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31377 and previous config saved to /var/cache/conftool/dbconfig/20220718-165617-root.json
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31376 and previous config saved to /var/cache/conftool/dbconfig/20220718-165455-marostegui.json
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31375 and previous config saved to /var/cache/conftool/dbconfig/20220718-165349-marostegui.json
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 16:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 16:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31374 and previous config saved to /var/cache/conftool/dbconfig/20220718-165329-marostegui.json
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31373 and previous config saved to /var/cache/conftool/dbconfig/20220718-163824-marostegui.json
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31372 and previous config saved to /var/cache/conftool/dbconfig/20220718-162319-marostegui.json
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 16:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31371 and previous config saved to /var/cache/conftool/dbconfig/20220718-160813-marostegui.json
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31370 and previous config saved to /var/cache/conftool/dbconfig/20220718-160708-marostegui.json
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 16:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31369 and previous config saved to /var/cache/conftool/dbconfig/20220718-160648-marostegui.json
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 15:52 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:814846{{!}} Bumping portals to master (T128546)]] (duration: 02m 59s)
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 15:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31368 and previous config saved to /var/cache/conftool/dbconfig/20220718-155143-marostegui.json
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 15:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:814846{{!}} Bumping portals to master (T128546)]] (duration: 03m 03s)
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 15:40 ejegg: updated fundraising CiviCRM from {{Gerrit|55bc690b}} to {{Gerrit|b4a7154a}}
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31367 and previous config saved to /var/cache/conftool/dbconfig/20220718-153637-marostegui.json
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31366 and previous config saved to /var/cache/conftool/dbconfig/20220718-152132-marostegui.json
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31365 and previous config saved to /var/cache/conftool/dbconfig/20220718-152026-marostegui.json
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 15:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31364 and previous config saved to /var/cache/conftool/dbconfig/20220718-151944-marostegui.json
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31363 and previous config saved to /var/cache/conftool/dbconfig/20220718-150439-marostegui.json
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31362 and previous config saved to /var/cache/conftool/dbconfig/20220718-145909-ladsgroup.json
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2012.codfw.wmnet to cluster codfw and group C
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2012.codfw.wmnet to cluster codfw and group C
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31361 and previous config saved to /var/cache/conftool/dbconfig/20220718-144934-marostegui.json
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31360 and previous config saved to /var/cache/conftool/dbconfig/20220718-144404-ladsgroup.json
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31359 and previous config saved to /var/cache/conftool/dbconfig/20220718-143428-marostegui.json
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 14:29 Lucas_WMDE: UTC afternoon backport+config window done
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 14:29 lucaswerkmeister-wmde@deploy1002: Finished scap: refresh everything after adding CampaignEvents to extension-list ([[phab:T311752|T311752]], only enabled in Beta so far), just in case (duration: 14m 40s)
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31358 and previous config saved to /var/cache/conftool/dbconfig/20220718-142859-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: refresh everything after adding CampaignEvents to extension-list ([[phab:T311752|T311752]], only enabled in Beta so far), just in case
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31357 and previous config saved to /var/cache/conftool/dbconfig/20220718-141354-ladsgroup.json
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:813991{{!}}Load and configure the CampaignEvents extension where enabled (T311752)]] (2/2: should be prod no-op) (duration: 02m 40s)
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31356 and previous config saved to /var/cache/conftool/dbconfig/20220718-140947-ladsgroup.json
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 14:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31355 and previous config saved to /var/cache/conftool/dbconfig/20220718-140926-ladsgroup.json
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:813991{{!}}Load and configure the CampaignEvents extension where enabled (T311752)]] (1/2: should be no-op) (duration: 02m 51s)
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:813990{{!}}Enable the CampaignEvents extension on beta (T311752)]] (no-op) (duration: 02m 43s)
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31354 and previous config saved to /var/cache/conftool/dbconfig/20220718-135421-ladsgroup.json
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813989{{!}}Add config variable for the CampaignEvents extension (T311752)]] (no-op) (duration: 02m 55s)
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:813986{{!}}Add CampaignEvents to extension-list (T311752)]] (duration: 03m 08s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2028.codfw.wmnet to cluster codfw and group A
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:45 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2028.codfw.wmnet to cluster codfw and group A
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2018.codfw.wmnet with OS bullseye
* 13:56 moritzm: installing mako security updates
* 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31353 and previous config saved to /var/cache/conftool/dbconfig/20220718-133916-ladsgroup.json
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31352 and previous config saved to /var/cache/conftool/dbconfig/20220718-133414-marostegui.json
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31351 and previous config saved to /var/cache/conftool/dbconfig/20220718-133354-marostegui.json
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2028.codfw.wmnet
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814111{{!}}Make weighted_tags search default for commonswiki]] (duration: 02m 54s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31350 and previous config saved to /var/cache/conftool/dbconfig/20220718-132411-ladsgroup.json
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13: