You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s))
imported>Stashbot
(TimStarling: on mwmaint1002 running populateGlobalEditCount.php)
Line 1: Line 1:
== 2022-03-11 ==
* 00:33 TimStarling: on mwmaint1002 running populateGlobalEditCount.php
* 00:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 00:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
== 2022-03-10 ==
== 2022-03-10 ==
* 23:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 23:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 23:08 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 23:07 rzl@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 22:42 tstarling@deploy1002: Finished scap: global_edit_count gerrit 769561 (duration: 15m 12s)
* 22:27 tstarling@deploy1002: Started scap: global_edit_count gerrit 769561
* 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/User/CentralAuthUser.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/Hooks/Handlers/UserEditCountUpdateHookHandler.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 22:23 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthServices.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 22:22 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/ServiceWiring.php: global_edit_count gerrit 769561 (duration: 00m 48s)
* 22:21 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthEditCounter.php: global_edit_count gerrit 769561 (duration: 00m 48s)
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:04 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:41 rzl: UTC late B&C training window done
* 21:39 rzl@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:769779{{!}}CommonSettings: Update comment about Image Suggestions API (T294362)]] (duration: 00m 48s)
* 21:34 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/modules/controller.js: Backport: [[gerrit:769559{{!}}Fix highlighting of comments when reloading (T303261)]] (duration: 00m 47s)
* 21:33 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw: Backport: [[gerrit:769558{{!}}Preserve classes on media wrapper links (T292657 T303469)]] (duration: 00m 49s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:18 cstone: update Donation Interface revision changed from {{Gerrit|ca37a93e}} to {{Gerrit|5db12b21}}
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:13 rzl@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:766307{{!}}Remove centralauth-oversight from the config (T302675)]] (duration: 00m 49s)
* 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22356 and previous config saved to /var/cache/conftool/dbconfig/20220310-205114-marostegui.json
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22355 and previous config saved to /var/cache/conftool/dbconfig/20220310-203608-marostegui.json
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22354 and previous config saved to /var/cache/conftool/dbconfig/20220310-202103-marostegui.json
* 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22353 and previous config saved to /var/cache/conftool/dbconfig/20220310-200558-marostegui.json
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:47 volans: installed spicerack v2.3.2 on the cumin hosts
* 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:46 volans@cumin2002: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 19:46 volans@cumin2002: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 19:44 volans: uploaded spicerack_2.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 19:33 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 19:32 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 19:32 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 19:31 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 19:29 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 19:29 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 19:06 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 19:06 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22352 and previous config saved to /var/cache/conftool/dbconfig/20220310-190544-marostegui.json
* 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22351 and previous config saved to /var/cache/conftool/dbconfig/20220310-190530-marostegui.json
* 19:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 19:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 19:02 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 19:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 19:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 18:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 18:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 18:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 18:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 18:56 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22350 and previous config saved to /var/cache/conftool/dbconfig/20220310-185025-marostegui.json
* 18:46 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 18:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 18:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 18:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:40 moritzm: restarting thumbor to pick up tiff security updates
* 18:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 18:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 18:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 18:36 moritzm: installing tiff security updates
* 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22349 and previous config saved to /var/cache/conftool/dbconfig/20220310-183520-marostegui.json
* 18:33 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 18:30 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 18:29 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 18:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 18:27 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 18:26 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22348 and previous config saved to /var/cache/conftool/dbconfig/20220310-182015-marostegui.json
* 18:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:19 razzi: cumin 'C:elasticsearch' 'systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service'
* 18:15 razzi: systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service on elastic2042 for [[phab:T300295|T300295]]
* 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:11 moritzm: installing cyrus-sasl2 security updates
* 18:08 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 17:51 herron: repool thanos-fe1001
* 17:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:43 herron: depooling thanos-fe1001 for envoy upgrade
* 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:41 dancy@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:761965{{!}}wmf-config: Use __DIR__ instead of "$IP/../wmf-config" (T45956)]] (duration: 00m 50s)
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22347 and previous config saved to /var/cache/conftool/dbconfig/20220310-172001-marostegui.json
* 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22346 and previous config saved to /var/cache/conftool/dbconfig/20220310-171953-marostegui.json
* 17:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22345 and previous config saved to /var/cache/conftool/dbconfig/20220310-170448-marostegui.json
* 16:57 damilare: civicrm change revision from {{Gerrit|9b5aafbc}} to {{Gerrit|4cb2bdbc}}
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json
* 16:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
* 16:50 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
* 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22343 and previous config saved to /var/cache/conftool/dbconfig/20220310-164943-marostegui.json
* 16:49 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on D<nowiki>{</nowiki>cumin1001.mgmt<nowiki>}</nowiki> with reason: Testing alertmanager downtime
* 16:49 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on D<nowiki>{</nowiki>cumin1001.mgmt<nowiki>}</nowiki> with reason: Testing alertmanager downtime
* 16:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Testing alertmanager downtime
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22342 and previous config saved to /var/cache/conftool/dbconfig/20220310-163509-ladsgroup.json
* 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22341 and previous config saved to /var/cache/conftool/dbconfig/20220310-163438-marostegui.json
* 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
* 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
* 16:30 sukhe: depool doh1002 for testing eBPF
* 16:21 volans: uploaded spicerack_2.3.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22340 and previous config saved to /var/cache/conftool/dbconfig/20220310-162004-ladsgroup.json
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json
* 15:57 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:56 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1121.eqiad.wmnet with OS bullseye
* 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
* 15:37 moritzm: rolling restart of thumbor to pick up expat security updates
* 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22338 and previous config saved to /var/cache/conftool/dbconfig/20220310-153428-marostegui.json
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22337 and previous config saved to /var/cache/conftool/dbconfig/20220310-153424-marostegui.json
* 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22336 and previous config saved to /var/cache/conftool/dbconfig/20220310-153416-marostegui.json
* 15:33 sukhe: upload certspotter 0.10-1wm1 to apt.wm.o - [[phab:T204993|T204993]]
* 15:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1121.eqiad.wmnet with OS bullseye
* 15:21 moritzm: installing expat security updates on stretch
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22335 and previous config saved to /var/cache/conftool/dbconfig/20220310-151923-marostegui.json
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22334 and previous config saved to /var/cache/conftool/dbconfig/20220310-151910-marostegui.json
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22331 and previous config saved to /var/cache/conftool/dbconfig/20220310-150417-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22330 and previous config saved to /var/cache/conftool/dbconfig/20220310-150405-marostegui.json
* 14:55 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:54 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22329 and previous config saved to /var/cache/conftool/dbconfig/20220310-145258-ladsgroup.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22328 and previous config saved to /var/cache/conftool/dbconfig/20220310-144911-marostegui.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22327 and previous config saved to /var/cache/conftool/dbconfig/20220310-144900-marostegui.json
* 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22326 and previous config saved to /var/cache/conftool/dbconfig/20220310-144222-marostegui.json
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22325 and previous config saved to /var/cache/conftool/dbconfig/20220310-144214-marostegui.json
* 14:41 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22324 and previous config saved to /var/cache/conftool/dbconfig/20220310-143753-ladsgroup.json
* 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22323 and previous config saved to /var/cache/conftool/dbconfig/20220310-142709-marostegui.json
* 14:26 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:25 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22321 and previous config saved to /var/cache/conftool/dbconfig/20220310-141204-marostegui.json
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:08 akosiaris: repool ores in eqiad in discovery records
* 14:06 urbanecm: UTC afternoon B&C done
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22320 and previous config saved to /var/cache/conftool/dbconfig/20220310-135659-marostegui.json
* 13:55 akosiaris: depool ores in eqiad from discovery records to initiate reboot of rdb1011
* 13:55 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
* 13:51 akosiaris: repool ores in codfw in discovery records
* 13:50 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22319 and previous config saved to /var/cache/conftool/dbconfig/20220310-135047-marostegui.json
* 13:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22318 and previous config saved to /var/cache/conftool/dbconfig/20220310-135039-marostegui.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22317 and previous config saved to /var/cache/conftool/dbconfig/20220310-134807-marostegui.json
* 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22316 and previous config saved to /var/cache/conftool/dbconfig/20220310-134759-marostegui.json
* 13:43 akosiaris: reboot rdb2007 for upgrades
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22315 and previous config saved to /var/cache/conftool/dbconfig/20220310-133534-marostegui.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22314 and previous config saved to /var/cache/conftool/dbconfig/20220310-133254-marostegui.json
* 13:27 akosiaris: depool ores in codfw from discovery records to initiate reboot of rdb2007
* 13:26 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
* 13:22 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:20 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22311 and previous config saved to /var/cache/conftool/dbconfig/20220310-132029-marostegui.json
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22310 and previous config saved to /var/cache/conftool/dbconfig/20220310-131748-marostegui.json
* 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22308 and previous config saved to /var/cache/conftool/dbconfig/20220310-130523-marostegui.json
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22307 and previous config saved to /var/cache/conftool/dbconfig/20220310-130243-marostegui.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22306 and previous config saved to /var/cache/conftool/dbconfig/20220310-125909-marostegui.json
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22305 and previous config saved to /var/cache/conftool/dbconfig/20220310-125901-marostegui.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22304 and previous config saved to /var/cache/conftool/dbconfig/20220310-125709-ladsgroup.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22303 and previous config saved to /var/cache/conftool/dbconfig/20220310-124355-marostegui.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22302 and previous config saved to /var/cache/conftool/dbconfig/20220310-124204-ladsgroup.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22301 and previous config saved to /var/cache/conftool/dbconfig/20220310-122850-marostegui.json
* 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json
* 12:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1141.eqiad.wmnet with OS bullseye
* 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
* 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22299 and previous config saved to /var/cache/conftool/dbconfig/20220310-121344-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22298 and previous config saved to /var/cache/conftool/dbconfig/20220310-120228-marostegui.json
* 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22297 and previous config saved to /var/cache/conftool/dbconfig/20220310-120221-marostegui.json
* 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
* 11:58 marostegui: Failover m1 master
* 11:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
* 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
* 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22296 and previous config saved to /var/cache/conftool/dbconfig/20220310-114715-marostegui.json
* 11:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1141.eqiad.wmnet with OS bullseye
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json
* 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22293 and previous config saved to /var/cache/conftool/dbconfig/20220310-113210-marostegui.json
* 11:29 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b681376]: (no justification provided) (duration: 00m 07s)
* 11:29 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b681376]: (no justification provided)
* 11:26 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 11:26 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 11:25 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1093.eqiad.wmnet
* 11:24 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 11:18 volans: rolled out python3-wmflib v1.1.2 to the entire fleet (buster+ only)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22292 and previous config saved to /var/cache/conftool/dbconfig/20220310-111705-marostegui.json
* 11:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1093.eqiad.wmnet
* 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22291 and previous config saved to /var/cache/conftool/dbconfig/20220310-111330-marostegui.json
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22290 and previous config saved to /var/cache/conftool/dbconfig/20220310-111320-marostegui.json
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22289 and previous config saved to /var/cache/conftool/dbconfig/20220310-111313-marostegui.json
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:10 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22287 and previous config saved to /var/cache/conftool/dbconfig/20220310-110253-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22286 and previous config saved to /var/cache/conftool/dbconfig/20220310-105807-marostegui.json
* 10:48 jbond: re-enable puppet fleet wide
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22285 and previous config saved to /var/cache/conftool/dbconfig/20220310-104748-marostegui.json
* 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 10:44 akosiaris: reboot rdb2009 for upgrades
* 10:44 jbond: disable puppet fleet wide
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22284 and previous config saved to /var/cache/conftool/dbconfig/20220310-104302-marostegui.json
* 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22283 and previous config saved to /var/cache/conftool/dbconfig/20220310-103243-marostegui.json
* 10:30 moritzm: failover ganeti master for drmrs/B13 to ganeti6004
* 10:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22282 and previous config saved to /var/cache/conftool/dbconfig/20220310-102757-marostegui.json
* 10:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22281 and previous config saved to /var/cache/conftool/dbconfig/20220310-101738-marostegui.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22280 and previous config saved to /var/cache/conftool/dbconfig/20220310-101133-marostegui.json
* 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22279 and previous config saved to /var/cache/conftool/dbconfig/20220310-101125-marostegui.json
* 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22278 and previous config saved to /var/cache/conftool/dbconfig/20220310-095620-marostegui.json
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22277 and previous config saved to /var/cache/conftool/dbconfig/20220310-094115-marostegui.json
* 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22276 and previous config saved to /var/cache/conftool/dbconfig/20220310-092742-marostegui.json
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22275 and previous config saved to /var/cache/conftool/dbconfig/20220310-092735-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22274 and previous config saved to /var/cache/conftool/dbconfig/20220310-092610-marostegui.json
* 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22273 and previous config saved to /var/cache/conftool/dbconfig/20220310-091807-marostegui.json
* 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22272 and previous config saved to /var/cache/conftool/dbconfig/20220310-091759-marostegui.json
* 09:16 moritzm: failover ganeti master for drmrs/B12 to ganeti6003
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22271 and previous config saved to /var/cache/conftool/dbconfig/20220310-091230-marostegui.json
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22270 and previous config saved to /var/cache/conftool/dbconfig/20220310-090254-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22269 and previous config saved to /var/cache/conftool/dbconfig/20220310-085724-marostegui.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22268 and previous config saved to /var/cache/conftool/dbconfig/20220310-084749-marostegui.json
* 08:43 apergos: UTC morning backport and config window completed
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22267 and previous config saved to /var/cache/conftool/dbconfig/20220310-084219-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22266 and previous config saved to /var/cache/conftool/dbconfig/20220310-084139-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22265 and previous config saved to /var/cache/conftool/dbconfig/20220310-083732-root.json
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22264 and previous config saved to /var/cache/conftool/dbconfig/20220310-083244-marostegui.json
* 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22263 and previous config saved to /var/cache/conftool/dbconfig/20220310-082737-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22262 and previous config saved to /var/cache/conftool/dbconfig/20220310-082642-marostegui.json
* 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22261 and previous config saved to /var/cache/conftool/dbconfig/20220310-082634-marostegui.json
* 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:24 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 2: [[gerrit:769656{{!}}SectionTranslation: Also add languages to target (T298237)]] (duration: 00m 49s)
* 08:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22260 and previous config saved to /var/cache/conftool/dbconfig/20220310-082234-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22259 and previous config saved to /var/cache/conftool/dbconfig/20220310-082227-root.json
* 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22258 and previous config saved to /var/cache/conftool/dbconfig/20220310-082223-marostegui.json
* 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:19 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 1: [[gerrit:769386{{!}}Enable SectionTranslation on Javanese, Tagalog, Mongolian, Telugu WPs (T298237)]] (duration: 00m 50s)
* 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) for reboot', diff saved to https://phabricator.wikimedia.org/P22256 and previous config saved to /var/cache/conftool/dbconfig/20220310-081244-marostegui.json
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22255 and previous config saved to /var/cache/conftool/dbconfig/20220310-081129-marostegui.json
* 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22254 and previous config saved to /var/cache/conftool/dbconfig/20220310-080718-marostegui.json
* 08:03 marostegui: Reboot dbproxy1017 1016 [[phab:T303174|T303174]]
* 08:00 marostegui: Reboot dbproxy1012, 1015, 1016 [[phab:T303174|T303174]]
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22253 and previous config saved to /var/cache/conftool/dbconfig/20220310-075623-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22252 and previous config saved to /var/cache/conftool/dbconfig/20220310-075213-marostegui.json
* 07:43 marostegui: Reboot dbproxy2001, 2002, 2003, 2004 [[phab:T303174|T303174]]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22251 and previous config saved to /var/cache/conftool/dbconfig/20220310-074118-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22250 and previous config saved to /var/cache/conftool/dbconfig/20220310-073708-marostegui.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22249 and previous config saved to /var/cache/conftool/dbconfig/20220310-073523-marostegui.json
* 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22248 and previous config saved to /var/cache/conftool/dbconfig/20220310-073022-marostegui.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22247 and previous config saved to /var/cache/conftool/dbconfig/20220310-072124-marostegui.json
* 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22246 and previous config saved to /var/cache/conftool/dbconfig/20220310-072019-marostegui.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22245 and previous config saved to /var/cache/conftool/dbconfig/20220310-071516-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22244 and previous config saved to /var/cache/conftool/dbconfig/20220310-070514-marostegui.json
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1132.eqiad.wmnet with OS bullseye
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22243 and previous config saved to /var/cache/conftool/dbconfig/20220310-070011-marostegui.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22242 and previous config saved to /var/cache/conftool/dbconfig/20220310-065009-marostegui.json
* 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22241 and previous config saved to /var/cache/conftool/dbconfig/20220310-064506-marostegui.json
* 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22240 and previous config saved to /var/cache/conftool/dbconfig/20220310-063858-marostegui.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22239 and previous config saved to /var/cache/conftool/dbconfig/20220310-063850-marostegui.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22238 and previous config saved to /var/cache/conftool/dbconfig/20220310-063503-marostegui.json
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1132.eqiad.wmnet with OS bullseye
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22237 and previous config saved to /var/cache/conftool/dbconfig/20220310-063017-marostegui.json
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22236 and previous config saved to /var/cache/conftool/dbconfig/20220310-062345-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22235 and previous config saved to /var/cache/conftool/dbconfig/20220310-060840-marostegui.json
* 06:07 marostegui: dbmaint on s3@eqiad [[phab:T272512|T272512]]
* 06:05 marostegui: dbmaint on s7@eqiad [[phab:T272512|T272512]]
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22234 and previous config saved to /var/cache/conftool/dbconfig/20220310-055335-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22233 and previous config saved to /var/cache/conftool/dbconfig/20220310-054701-marostegui.json
* 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui: dbmaint on s5@eqiad [[phab:T272512|T272512]]
* 05:46 marostegui: dbmaint on s4@eqiad [[phab:T272512|T272512]]
* 05:46 marostegui: dbmaint on pc3@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on pc2@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on pc1@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on s2@eqiad [[phab:T272512|T272512]]
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22232 and previous config saved to /var/cache/conftool/dbconfig/20220310-053950-marostegui.json
* 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 00:26 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s)
* 00:26 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s)
* 00:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics@7975c27]: (no justification provided)
* 00:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics@7975c27]: (no justification provided)

Revision as of 00:33, 11 March 2022

2022-03-11

  • 00:33 TimStarling: on mwmaint1002 running populateGlobalEditCount.php
  • 00:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 00:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply

2022-03-10

  • 23:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 23:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 23:08 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 23:07 rzl@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 22:42 tstarling@deploy1002: Finished scap: global_edit_count gerrit 769561 (duration: 15m 12s)
  • 22:27 tstarling@deploy1002: Started scap: global_edit_count gerrit 769561
  • 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/User/CentralAuthUser.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/Hooks/Handlers/UserEditCountUpdateHookHandler.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:23 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthServices.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:22 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/ServiceWiring.php: global_edit_count gerrit 769561 (duration: 00m 48s)
  • 22:21 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthEditCounter.php: global_edit_count gerrit 769561 (duration: 00m 48s)
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:41 rzl: UTC late B&C training window done
  • 21:39 rzl@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: CommonSettings: Update comment about Image Suggestions API (T294362) (duration: 00m 48s)
  • 21:34 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/modules/controller.js: Backport: Fix highlighting of comments when reloading (T303261) (duration: 00m 47s)
  • 21:33 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw: Backport: Preserve classes on media wrapper links (T292657 T303469) (duration: 00m 49s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 cstone: update Donation Interface revision changed from ca37a93e to 5db12b21
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:13 rzl@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove centralauth-oversight from the config (T302675) (duration: 00m 49s)
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22356 and previous config saved to /var/cache/conftool/dbconfig/20220310-205114-marostegui.json
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22355 and previous config saved to /var/cache/conftool/dbconfig/20220310-203608-marostegui.json
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22354 and previous config saved to /var/cache/conftool/dbconfig/20220310-202103-marostegui.json
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22353 and previous config saved to /var/cache/conftool/dbconfig/20220310-200558-marostegui.json
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:47 volans: installed spicerack v2.3.2 on the cumin hosts
  • 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:46 volans@cumin2002: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
  • 19:46 volans@cumin2002: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
  • 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.25 refs T300201
  • 19:44 volans: uploaded spicerack_2.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 19:33 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 19:32 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 19:32 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 19:31 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 19:29 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 19:29 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:06 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:06 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22352 and previous config saved to /var/cache/conftool/dbconfig/20220310-190544-marostegui.json
  • 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22351 and previous config saved to /var/cache/conftool/dbconfig/20220310-190530-marostegui.json
  • 19:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 19:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 18:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 18:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:56 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22350 and previous config saved to /var/cache/conftool/dbconfig/20220310-185025-marostegui.json
  • 18:46 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:40 moritzm: restarting thumbor to pick up tiff security updates
  • 18:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:36 moritzm: installing tiff security updates
  • 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22349 and previous config saved to /var/cache/conftool/dbconfig/20220310-183520-marostegui.json
  • 18:33 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 18:30 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 18:29 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 18:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 18:27 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 18:26 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22348 and previous config saved to /var/cache/conftool/dbconfig/20220310-182015-marostegui.json
  • 18:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:19 razzi: cumin 'C:elasticsearch' 'systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service'
  • 18:15 razzi: systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service on elastic2042 for T300295
  • 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 18:11 moritzm: installing cyrus-sasl2 security updates
  • 18:08 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:51 herron: repool thanos-fe1001
  • 17:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:43 herron: depooling thanos-fe1001 for envoy upgrade
  • 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:41 dancy@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: wmf-config: Use __DIR__ instead of "$IP/../wmf-config" (T45956) (duration: 00m 50s)
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22347 and previous config saved to /var/cache/conftool/dbconfig/20220310-172001-marostegui.json
  • 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22346 and previous config saved to /var/cache/conftool/dbconfig/20220310-171953-marostegui.json
  • 17:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22345 and previous config saved to /var/cache/conftool/dbconfig/20220310-170448-marostegui.json
  • 16:57 damilare: civicrm change revision from 9b5aafbc to 4cb2bdbc
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json
  • 16:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
  • 16:50 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22343 and previous config saved to /var/cache/conftool/dbconfig/20220310-164943-marostegui.json
  • 16:49 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on D{cumin1001.mgmt} with reason: Testing alertmanager downtime
  • 16:49 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on D{cumin1001.mgmt} with reason: Testing alertmanager downtime
  • 16:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Testing alertmanager downtime
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22342 and previous config saved to /var/cache/conftool/dbconfig/20220310-163509-ladsgroup.json
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22341 and previous config saved to /var/cache/conftool/dbconfig/20220310-163438-marostegui.json
  • 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
  • 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
  • 16:30 sukhe: depool doh1002 for testing eBPF
  • 16:21 volans: uploaded spicerack_2.3.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22340 and previous config saved to /var/cache/conftool/dbconfig/20220310-162004-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json
  • 15:57 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:56 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1121.eqiad.wmnet with OS bullseye
  • 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
  • 15:37 moritzm: rolling restart of thumbor to pick up expat security updates
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22338 and previous config saved to /var/cache/conftool/dbconfig/20220310-153428-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22337 and previous config saved to /var/cache/conftool/dbconfig/20220310-153424-marostegui.json
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22336 and previous config saved to /var/cache/conftool/dbconfig/20220310-153416-marostegui.json
  • 15:33 sukhe: upload certspotter 0.10-1wm1 to apt.wm.o - T204993
  • 15:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1121.eqiad.wmnet with OS bullseye
  • 15:21 moritzm: installing expat security updates on stretch
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22335 and previous config saved to /var/cache/conftool/dbconfig/20220310-151923-marostegui.json
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22334 and previous config saved to /var/cache/conftool/dbconfig/20220310-151910-marostegui.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22331 and previous config saved to /var/cache/conftool/dbconfig/20220310-150417-marostegui.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22330 and previous config saved to /var/cache/conftool/dbconfig/20220310-150405-marostegui.json
  • 14:55 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:54 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22329 and previous config saved to /var/cache/conftool/dbconfig/20220310-145258-ladsgroup.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22328 and previous config saved to /var/cache/conftool/dbconfig/20220310-144911-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22327 and previous config saved to /var/cache/conftool/dbconfig/20220310-144900-marostegui.json
  • 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22326 and previous config saved to /var/cache/conftool/dbconfig/20220310-144222-marostegui.json
  • 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22325 and previous config saved to /var/cache/conftool/dbconfig/20220310-144214-marostegui.json
  • 14:41 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22324 and previous config saved to /var/cache/conftool/dbconfig/20220310-143753-ladsgroup.json
  • 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22323 and previous config saved to /var/cache/conftool/dbconfig/20220310-142709-marostegui.json
  • 14:26 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22321 and previous config saved to /var/cache/conftool/dbconfig/20220310-141204-marostegui.json
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:08 akosiaris: repool ores in eqiad in discovery records
  • 14:06 urbanecm: UTC afternoon B&C done
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22320 and previous config saved to /var/cache/conftool/dbconfig/20220310-135659-marostegui.json
  • 13:55 akosiaris: depool ores in eqiad from discovery records to initiate reboot of rdb1011
  • 13:55 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
  • 13:51 akosiaris: repool ores in codfw in discovery records
  • 13:50 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22319 and previous config saved to /var/cache/conftool/dbconfig/20220310-135047-marostegui.json
  • 13:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22318 and previous config saved to /var/cache/conftool/dbconfig/20220310-135039-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22317 and previous config saved to /var/cache/conftool/dbconfig/20220310-134807-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22316 and previous config saved to /var/cache/conftool/dbconfig/20220310-134759-marostegui.json
  • 13:43 akosiaris: reboot rdb2007 for upgrades
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22315 and previous config saved to /var/cache/conftool/dbconfig/20220310-133534-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22314 and previous config saved to /var/cache/conftool/dbconfig/20220310-133254-marostegui.json
  • 13:27 akosiaris: depool ores in codfw from discovery records to initiate reboot of rdb2007
  • 13:26 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
  • 13:22 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:20 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22311 and previous config saved to /var/cache/conftool/dbconfig/20220310-132029-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22310 and previous config saved to /var/cache/conftool/dbconfig/20220310-131748-marostegui.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22308 and previous config saved to /var/cache/conftool/dbconfig/20220310-130523-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22307 and previous config saved to /var/cache/conftool/dbconfig/20220310-130243-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22306 and previous config saved to /var/cache/conftool/dbconfig/20220310-125909-marostegui.json
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22305 and previous config saved to /var/cache/conftool/dbconfig/20220310-125901-marostegui.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22304 and previous config saved to /var/cache/conftool/dbconfig/20220310-125709-ladsgroup.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22303 and previous config saved to /var/cache/conftool/dbconfig/20220310-124355-marostegui.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22302 and previous config saved to /var/cache/conftool/dbconfig/20220310-124204-ladsgroup.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22301 and previous config saved to /var/cache/conftool/dbconfig/20220310-122850-marostegui.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json
  • 12:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1141.eqiad.wmnet with OS bullseye
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22299 and previous config saved to /var/cache/conftool/dbconfig/20220310-121344-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22298 and previous config saved to /var/cache/conftool/dbconfig/20220310-120228-marostegui.json
  • 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22297 and previous config saved to /var/cache/conftool/dbconfig/20220310-120221-marostegui.json
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
  • 11:58 marostegui: Failover m1 master
  • 11:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22296 and previous config saved to /var/cache/conftool/dbconfig/20220310-114715-marostegui.json
  • 11:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1141.eqiad.wmnet with OS bullseye
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22293 and previous config saved to /var/cache/conftool/dbconfig/20220310-113210-marostegui.json
  • 11:29 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b681376]: (no justification provided) (duration: 00m 07s)
  • 11:29 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b681376]: (no justification provided)
  • 11:26 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:25 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1093.eqiad.wmnet
  • 11:24 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 11:18 volans: rolled out python3-wmflib v1.1.2 to the entire fleet (buster+ only)
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22292 and previous config saved to /var/cache/conftool/dbconfig/20220310-111705-marostegui.json
  • 11:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1093.eqiad.wmnet
  • 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22291 and previous config saved to /var/cache/conftool/dbconfig/20220310-111330-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22290 and previous config saved to /var/cache/conftool/dbconfig/20220310-111320-marostegui.json
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22289 and previous config saved to /var/cache/conftool/dbconfig/20220310-111313-marostegui.json
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:10 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 14 hosts with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 14 hosts with reason: Maintenance
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22287 and previous config saved to /var/cache/conftool/dbconfig/20220310-110253-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22286 and previous config saved to /var/cache/conftool/dbconfig/20220310-105807-marostegui.json
  • 10:48 jbond: re-enable puppet fleet wide
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22285 and previous config saved to /var/cache/conftool/dbconfig/20220310-104748-marostegui.json
  • 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 10:44 akosiaris: reboot rdb2009 for upgrades
  • 10:44 jbond: disable puppet fleet wide
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22284 and previous config saved to /var/cache/conftool/dbconfig/20220310-104302-marostegui.json
  • 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22283 and previous config saved to /var/cache/conftool/dbconfig/20220310-103243-marostegui.json
  • 10:30 moritzm: failover ganeti master for drmrs/B13 to ganeti6004
  • 10:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22282 and previous config saved to /var/cache/conftool/dbconfig/20220310-102757-marostegui.json
  • 10:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22281 and previous config saved to /var/cache/conftool/dbconfig/20220310-101738-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22280 and previous config saved to /var/cache/conftool/dbconfig/20220310-101133-marostegui.json
  • 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22279 and previous config saved to /var/cache/conftool/dbconfig/20220310-101125-marostegui.json
  • 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22278 and previous config saved to /var/cache/conftool/dbconfig/20220310-095620-marostegui.json
  • 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22277 and previous config saved to /var/cache/conftool/dbconfig/20220310-094115-marostegui.json
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22276 and previous config saved to /var/cache/conftool/dbconfig/20220310-092742-marostegui.json
  • 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22275 and previous config saved to /var/cache/conftool/dbconfig/20220310-092735-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22274 and previous config saved to /var/cache/conftool/dbconfig/20220310-092610-marostegui.json
  • 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22273 and previous config saved to /var/cache/conftool/dbconfig/20220310-091807-marostegui.json
  • 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22272 and previous config saved to /var/cache/conftool/dbconfig/20220310-091759-marostegui.json
  • 09:16 moritzm: failover ganeti master for drmrs/B12 to ganeti6003
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22271 and previous config saved to /var/cache/conftool/dbconfig/20220310-091230-marostegui.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22270 and previous config saved to /var/cache/conftool/dbconfig/20220310-090254-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22269 and previous config saved to /var/cache/conftool/dbconfig/20220310-085724-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22268 and previous config saved to /var/cache/conftool/dbconfig/20220310-084749-marostegui.json
  • 08:43 apergos: UTC morning backport and config window completed
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22267 and previous config saved to /var/cache/conftool/dbconfig/20220310-084219-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22266 and previous config saved to /var/cache/conftool/dbconfig/20220310-084139-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22265 and previous config saved to /var/cache/conftool/dbconfig/20220310-083732-root.json
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22264 and previous config saved to /var/cache/conftool/dbconfig/20220310-083244-marostegui.json
  • 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22263 and previous config saved to /var/cache/conftool/dbconfig/20220310-082737-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22262 and previous config saved to /var/cache/conftool/dbconfig/20220310-082642-marostegui.json
  • 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22261 and previous config saved to /var/cache/conftool/dbconfig/20220310-082634-marostegui.json
  • 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:24 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 2: SectionTranslation: Also add languages to target (T298237) (duration: 00m 49s)
  • 08:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22260 and previous config saved to /var/cache/conftool/dbconfig/20220310-082234-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22259 and previous config saved to /var/cache/conftool/dbconfig/20220310-082227-root.json
  • 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22258 and previous config saved to /var/cache/conftool/dbconfig/20220310-082223-marostegui.json
  • 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:19 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 1: Enable SectionTranslation on Javanese, Tagalog, Mongolian, Telugu WPs (T298237) (duration: 00m 50s)
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) for reboot', diff saved to https://phabricator.wikimedia.org/P22256 and previous config saved to /var/cache/conftool/dbconfig/20220310-081244-marostegui.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22255 and previous config saved to /var/cache/conftool/dbconfig/20220310-081129-marostegui.json
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22254 and previous config saved to /var/cache/conftool/dbconfig/20220310-080718-marostegui.json
  • 08:03 marostegui: Reboot dbproxy1017 1016 T303174
  • 08:00 marostegui: Reboot dbproxy1012, 1015, 1016 T303174
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22253 and previous config saved to /var/cache/conftool/dbconfig/20220310-075623-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22252 and previous config saved to /var/cache/conftool/dbconfig/20220310-075213-marostegui.json
  • 07:43 marostegui: Reboot dbproxy2001, 2002, 2003, 2004 T303174
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22251 and previous config saved to /var/cache/conftool/dbconfig/20220310-074118-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22250 and previous config saved to /var/cache/conftool/dbconfig/20220310-073708-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22249 and previous config saved to /var/cache/conftool/dbconfig/20220310-073523-marostegui.json
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22248 and previous config saved to /var/cache/conftool/dbconfig/20220310-073022-marostegui.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22247 and previous config saved to /var/cache/conftool/dbconfig/20220310-072124-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
  • 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22246 and previous config saved to /var/cache/conftool/dbconfig/20220310-072019-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22245 and previous config saved to /var/cache/conftool/dbconfig/20220310-071516-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22244 and previous config saved to /var/cache/conftool/dbconfig/20220310-070514-marostegui.json
  • 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1132.eqiad.wmnet with OS bullseye
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22243 and previous config saved to /var/cache/conftool/dbconfig/20220310-070011-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22242 and previous config saved to /var/cache/conftool/dbconfig/20220310-065009-marostegui.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22241 and previous config saved to /var/cache/conftool/dbconfig/20220310-064506-marostegui.json
  • 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22240 and previous config saved to /var/cache/conftool/dbconfig/20220310-063858-marostegui.json
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22239 and previous config saved to /var/cache/conftool/dbconfig/20220310-063850-marostegui.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22238 and previous config saved to /var/cache/conftool/dbconfig/20220310-063503-marostegui.json
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1132.eqiad.wmnet with OS bullseye
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22237 and previous config saved to /var/cache/conftool/dbconfig/20220310-063017-marostegui.json
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22236 and previous config saved to /var/cache/conftool/dbconfig/20220310-062345-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22235 and previous config saved to /var/cache/conftool/dbconfig/20220310-060840-marostegui.json
  • 06:07 marostegui: dbmaint on s3@eqiad T272512
  • 06:05 marostegui: dbmaint on s7@eqiad T272512
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22234 and previous config saved to /var/cache/conftool/dbconfig/20220310-055335-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22233 and previous config saved to /var/cache/conftool/dbconfig/20220310-054701-marostegui.json
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui: dbmaint on s5@eqiad T272512
  • 05:46 marostegui: dbmaint on s4@eqiad T272512
  • 05:46 marostegui: dbmaint on pc3@eqiad T272512
  • 05:45 marostegui: dbmaint on pc2@eqiad T272512
  • 05:45 marostegui: dbmaint on pc1@eqiad T272512
  • 05:45 marostegui: dbmaint on s2@eqiad T272512
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22232 and previous config saved to /var/cache/conftool/dbconfig/20220310-053950-marostegui.json
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:26 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s)
  • 00:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics@7975c27]: (no justification provided)
  • 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-03-09

  • 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:09 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25 refs T300201 (duration: 00m 49s)
  • 23:08 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:08 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:08 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25 refs T300201
  • 23:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1047.eqiad.wmnet
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
  • 22:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1047.eqiad.wmnet
  • 22:54 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
  • 22:35 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:35 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22229 and previous config saved to /var/cache/conftool/dbconfig/20220309-223130-marostegui.json
  • 22:15 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22228 and previous config saved to /var/cache/conftool/dbconfig/20220309-221555-marostegui.json
  • 22:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22226 and previous config saved to /var/cache/conftool/dbconfig/20220309-220020-marostegui.json
  • 21:57 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/Gadgets: T303455 (duration: 00m 50s)
  • 21:54 volans: uploaded python3-wmflib_1.1.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 21:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 21:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22225 and previous config saved to /var/cache/conftool/dbconfig/20220309-214445-marostegui.json
  • 21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 21:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:21 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs T300201 (duration: 00m 50s)
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:20 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs T300201
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:07 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25 refs T300201 (duration: 00m 49s)
  • 19:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25 refs T300201
  • 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22222 and previous config saved to /var/cache/conftool/dbconfig/20220309-182355-marostegui.json
  • 18:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22221 and previous config saved to /var/cache/conftool/dbconfig/20220309-182316-marostegui.json
  • 18:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22220 and previous config saved to /var/cache/conftool/dbconfig/20220309-180741-marostegui.json
  • 17:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22219 and previous config saved to /var/cache/conftool/dbconfig/20220309-175205-marostegui.json
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22217 and previous config saved to /var/cache/conftool/dbconfig/20220309-173630-marostegui.json
  • 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:29 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WebAuthn/: T303404 (duration: 00m 53s)
  • 17:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 reedy@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/WebAuthn/: T303404 (duration: 00m 51s)
  • 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2008.codfw.wmnet with OS bullseye
  • 17:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 17:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 16:56 akosiaris: reboot rdb[2008,2010].codfw.wmnet,rdb[1010,1012].eqiad.wmnet for upgrades
  • 16:49 akosiaris: reboot rdb2008 for upgrades
  • 16:45 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2008.codfw.wmnet with OS bullseye
  • 16:22 moritzm: installing 5.10.103 kernels on bullseye hosts
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host karapace1001.eqiad.wmnet
  • 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/includes/parser/Sanitizer.php: 31189c6: Ensure that the recognizedTagData static cache is properly initialized (T303360) (duration: 00m 51s)
  • 15:56 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:56 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host karapace1001.eqiad.wmnet
  • 15:33 jbond: deploy gerrit:740818 to add more genral rate limits for crawling cached and upload pages
  • 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2007.codfw.wmnet with OS bullseye
  • 15:28 volans: uploaded spicerack_2.3.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 15:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:06 taavi: UTC afternoon deploys done
  • 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:06 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: Fix missing padding on inline descriptions (T303386) (duration: 00m 49s)
  • 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 6 hosts with reason: Maintenance
  • 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 6 hosts with reason: Maintenance
  • 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22215 and previous config saved to /var/cache/conftool/dbconfig/20220309-150523-marostegui.json
  • 15:03 awight@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: Fix missing padding on inline descriptions (T303386) (duration: 00m 49s)
  • 15:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2007.codfw.wmnet with OS bullseye
  • 15:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:58 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/extension.json: Backport: wmf.24 HACK: Add forward class alias for Gadget (T303391) (2/2) (duration: 00m 49s)
  • 14:57 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/includes: Backport: wmf.24 HACK: Add forward class alias for Gadget (T303391) (1/2) (duration: 00m 50s)
  • 14:55 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
  • 14:54 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
  • 14:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22213 and previous config saved to /var/cache/conftool/dbconfig/20220309-144948-marostegui.json
  • 14:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22212 and previous config saved to /var/cache/conftool/dbconfig/20220309-143413-marostegui.json
  • 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:27 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Add IPInfo viewing rights for certain groups (T296499) (no-op on prod) (duration: 00m 50s)
  • 14:18 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22211 and previous config saved to /var/cache/conftool/dbconfig/20220309-141837-marostegui.json
  • 14:13 damilare: civicrm revision changed from cb0605ed to 9b5aafbc
  • 14:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22210 and previous config saved to /var/cache/conftool/dbconfig/20220309-140158-marostegui.json
  • 14:01 marostegui: Failover m5 from db1132 to db1107 - T302190
  • 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:59 btullis: restarting pybal on lvs1019 T301458
  • 13:51 btullis: restarting pybal on lvs102 T301458
  • 13:47 marostegui: dbmaint on s8@eqiad T272512
  • 13:46 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22209 and previous config saved to /var/cache/conftool/dbconfig/20220309-134631-marostegui.json
  • 13:45 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22208 and previous config saved to /var/cache/conftool/dbconfig/20220309-134552-marostegui.json
  • 13:42 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22207 and previous config saved to /var/cache/conftool/dbconfig/20220309-134235-marostegui.json
  • 13:30 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22206 and previous config saved to /var/cache/conftool/dbconfig/20220309-133017-marostegui.json
  • 13:27 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22205 and previous config saved to /var/cache/conftool/dbconfig/20220309-132700-marostegui.json
  • 13:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22204 and previous config saved to /var/cache/conftool/dbconfig/20220309-131442-marostegui.json
  • 13:11 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22203 and previous config saved to /var/cache/conftool/dbconfig/20220309-131124-marostegui.json
  • 12:59 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22202 and previous config saved to /var/cache/conftool/dbconfig/20220309-125907-marostegui.json
  • 12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
  • 12:56 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
  • 12:55 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22201 and previous config saved to /var/cache/conftool/dbconfig/20220309-125549-marostegui.json
  • 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1001.eqiad.wmnet with OS bullseye
  • 12:26 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 12:25 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22200 and previous config saved to /var/cache/conftool/dbconfig/20220309-122536-marostegui.json
  • 12:25 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:24 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22199 and previous config saved to /var/cache/conftool/dbconfig/20220309-120554-marostegui.json
  • 11:50 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22198 and previous config saved to /var/cache/conftool/dbconfig/20220309-115019-marostegui.json
  • 11:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:43 awight: sketchy EU deployment complete.
  • 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:42 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Syntax highlighting color scheme update on all wikis except enwiki (T280024) (duration: 00m 50s)
  • 11:41 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:37 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Bracket matching on all wikis except enwiki (T280023) (duration: 00m 49s)
  • 11:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22197 and previous config saved to /var/cache/conftool/dbconfig/20220309-113442-marostegui.json
  • 11:32 awight@deploy1002: Synchronized wmf-config/: Config: VE template expanded sidebar and inline descriptions on all wikis except enwiki (T286991) (duration: 00m 51s)
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:19 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22195 and previous config saved to /var/cache/conftool/dbconfig/20220309-111907-marostegui.json
  • 11:17 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: VE template back and delete button on all wikis except enwiki (T286990) (duration: 00m 50s)
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1001.eqiad.wmnet with OS bullseye
  • 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:11 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Template search improvements to all wikis except enwiki (T286990) (duration: 00m 51s)
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1016.eqiad.wmnet
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:51 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
  • 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 10:39 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 10:32 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22194 and previous config saved to /var/cache/conftool/dbconfig/20220309-103226-marostegui.json
  • 10:31 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22193 and previous config saved to /var/cache/conftool/dbconfig/20220309-103146-marostegui.json
  • 10:29 marostegui: dbmaint on s6@eqiad T272512
  • 10:29 marostegui: dbmaint on s3@eqiad T298295
  • 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 10:16 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22192 and previous config saved to /var/cache/conftool/dbconfig/20220309-101610-marostegui.json
  • 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 10:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: reenable DPL on nowikimedia (duration: 00m 51s)
  • 10:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22191 and previous config saved to /var/cache/conftool/dbconfig/20220309-100036-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2147', diff saved to https://phabricator.wikimedia.org/P22190 and previous config saved to /var/cache/conftool/dbconfig/20220309-094704-marostegui.json
  • 09:45 marostegui: dbmaint on s7@eqiad T298295
  • 09:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22189 and previous config saved to /var/cache/conftool/dbconfig/20220309-094501-marostegui.json
  • 09:31 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22188 and previous config saved to /var/cache/conftool/dbconfig/20220309-093119-marostegui.json
  • 09:30 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22187 and previous config saved to /var/cache/conftool/dbconfig/20220309-092731-marostegui.json
  • 09:26 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:26 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:23 marostegui: dbmaint on s2@eqiad T298295
  • 09:18 marostegui: dbmaint on s1@eqiad T298295
  • 09:16 marostegui: dbmaint on s4@eqiad T298295
  • 09:07 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22186 and previous config saved to /var/cache/conftool/dbconfig/20220309-090737-marostegui.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:53 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22184 and previous config saved to /var/cache/conftool/dbconfig/20220309-085201-marostegui.json
  • 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:49 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:46 XioNoX: Redirect one of Microsoft's range to codfw - T282861
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22183 and previous config saved to /var/cache/conftool/dbconfig/20220309-083626-marostegui.json
  • 08:21 marostegui: dbmaint on s3@eqiad T300380
  • 08:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22182 and previous config saved to /var/cache/conftool/dbconfig/20220309-082051-marostegui.json
  • 08:11 marostegui: dbmaint on s7@eqiad T300380
  • 08:03 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22181 and previous config saved to /var/cache/conftool/dbconfig/20220309-080307-marostegui.json
  • 08:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 40%: After schema change', diff saved to https://phabricator.wikimedia.org/P22180 and previous config saved to /var/cache/conftool/dbconfig/20220309-075704-root.json
  • 07:55 marostegui: dbmaint on s2@eqiad T300380
  • 07:49 marostegui: dbmaint on s8@eqiad T300380
  • 07:49 marostegui: dbmaint on s4@eqiad T300380
  • 07:42 marostegui: dbmaint on s1@eqiad T300380
  • 07:42 marostegui: dbmaint on s6@eqiad T300380
  • 07:42 marostegui: dbmaint on s5@eqiad T300380
  • 07:42 marostegui: dbmaint on s5 T300380
  • 07:42 marostegui: dbmaint on s6 T300380
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22179 and previous config saved to /var/cache/conftool/dbconfig/20220309-074200-root.json
  • 07:41 marostegui: dbmaint on s1 T300380
  • 07:41 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22178 and previous config saved to /var/cache/conftool/dbconfig/20220309-074107-root.json
  • 07:34 marostegui: dbmaint on s7@eqiad T300775
  • 07:33 marostegui: dbmaint on db1123 s3@eqiad T300600
  • 07:31 elukey: manually sync pcc facts following https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 15%: After schema change', diff saved to https://phabricator.wikimedia.org/P22177 and previous config saved to /var/cache/conftool/dbconfig/20220309-072656-root.json
  • 07:25 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22176 and previous config saved to /var/cache/conftool/dbconfig/20220309-072540-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P22175 and previous config saved to /var/cache/conftool/dbconfig/20220309-071153-root.json
  • 07:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22174 and previous config saved to /var/cache/conftool/dbconfig/20220309-071014-root.json
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1123.eqiad.wmnet with OS bullseye
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
  • 06:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22173 and previous config saved to /var/cache/conftool/dbconfig/20220309-065447-root.json
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1123.eqiad.wmnet with OS bullseye
  • 06:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22172 and previous config saved to /var/cache/conftool/dbconfig/20220309-062010-marostegui.json
  • 06:19 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22171 and previous config saved to /var/cache/conftool/dbconfig/20220309-014831-marostegui.json
  • 01:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22170 and previous config saved to /var/cache/conftool/dbconfig/20220309-013256-marostegui.json
  • 01:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22169 and previous config saved to /var/cache/conftool/dbconfig/20220309-011721-marostegui.json
  • 01:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22168 and previous config saved to /var/cache/conftool/dbconfig/20220309-010146-marostegui.json
  • 00:53 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22167 and previous config saved to /var/cache/conftool/dbconfig/20220309-005325-marostegui.json
  • 00:52 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22166 and previous config saved to /var/cache/conftool/dbconfig/20220309-005245-marostegui.json
  • 00:37 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22165 and previous config saved to /var/cache/conftool/dbconfig/20220309-003710-marostegui.json
  • 00:21 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22164 and previous config saved to /var/cache/conftool/dbconfig/20220309-002135-marostegui.json
  • 00:06 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22163 and previous config saved to /var/cache/conftool/dbconfig/20220309-000600-marostegui.json
  • 00:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22162 and previous config saved to /var/cache/conftool/dbconfig/20220309-000250-marostegui.json
  • 00:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22161 and previous config saved to /var/cache/conftool/dbconfig/20220309-000025-marostegui.json

2022-03-08

  • 23:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22160 and previous config saved to /var/cache/conftool/dbconfig/20220308-234450-marostegui.json
  • 23:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22159 and previous config saved to /var/cache/conftool/dbconfig/20220308-232915-marostegui.json
  • 23:13 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22158 and previous config saved to /var/cache/conftool/dbconfig/20220308-231340-marostegui.json
  • 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:10 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22157 and previous config saved to /var/cache/conftool/dbconfig/20220308-231028-marostegui.json
  • 23:09 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22156 and previous config saved to /var/cache/conftool/dbconfig/20220308-230949-marostegui.json
  • 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22155 and previous config saved to /var/cache/conftool/dbconfig/20220308-225413-marostegui.json
  • 22:38 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22153 and previous config saved to /var/cache/conftool/dbconfig/20220308-223838-marostegui.json
  • 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:24 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.25 refs T300201
  • 22:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22152 and previous config saved to /var/cache/conftool/dbconfig/20220308-222303-marostegui.json
  • 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22151 and previous config saved to /var/cache/conftool/dbconfig/20220308-222055-marostegui.json
  • 22:20 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 22:20 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22150 and previous config saved to /var/cache/conftool/dbconfig/20220308-222016-marostegui.json
  • 22:04 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22149 and previous config saved to /var/cache/conftool/dbconfig/20220308-220441-marostegui.json
  • 21:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22148 and previous config saved to /var/cache/conftool/dbconfig/20220308-214906-marostegui.json
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1016.eqiad.wmnet
  • 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22147 and previous config saved to /var/cache/conftool/dbconfig/20220308-213331-marostegui.json
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:33 urbanecm: UTC early B&C window done
  • 21:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 21:30 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22146 and previous config saved to /var/cache/conftool/dbconfig/20220308-213024-marostegui.json
  • 21:29 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22145 and previous config saved to /var/cache/conftool/dbconfig/20220308-212939-marostegui.json
  • 21:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/includes/ApiDiscussionToolsEdit.php: cc5acc2: Fix handling of disabled mobileformat (T303262) (duration: 00m 49s)
  • 21:26 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: a5c6d06: Fix handling of disabled mobileformat (T303262) (duration: 00m 49s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22144 and previous config saved to /var/cache/conftool/dbconfig/20220308-211404-marostegui.json
  • 21:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3132fca: Enable DiscussionTools autotopicsub on MediaWiki.org (T302256) (duration: 00m 49s)
  • 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.22 (duration: 01m 28s)
  • 21:01 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.23 (duration: 01m 46s)
  • 20:59 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.25 refs T300201 (duration: 32m 13s)
  • 20:58 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22143 and previous config saved to /var/cache/conftool/dbconfig/20220308-205829-marostegui.json
  • 20:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22142 and previous config saved to /var/cache/conftool/dbconfig/20220308-204254-marostegui.json
  • 20:36 rzl: rzl@apt1001:~$ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy # T300324
  • 20:36 rzl: rzl@apt1001:~$ sudo -i reprepro copy stretch-wikimedia buster-wikimedia envoyproxy # T300324
  • 20:27 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.25 refs T300201
  • 20:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 19:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 19:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 19:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 19:43 XioNoX: !log push DHCP term to labs-in filters on eqiad cr
  • 19:42 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22139 and previous config saved to /var/cache/conftool/dbconfig/20220308-194159-marostegui.json
  • 19:41 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22138 and previous config saved to /var/cache/conftool/dbconfig/20220308-193930-marostegui.json
  • 19:36 cstone: updated donorwiki revision changed from 73de4731 to ca37a93e
  • 19:32 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.25 refs T300201
  • 19:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22137 and previous config saved to /var/cache/conftool/dbconfig/20220308-192354-marostegui.json
  • 19:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:08 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22136 and previous config saved to /var/cache/conftool/dbconfig/20220308-190818-marostegui.json
  • 18:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:53 ejegg: updated payments-wiki from 3dfac3b2 to ca37a93e
  • 18:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22135 and previous config saved to /var/cache/conftool/dbconfig/20220308-185242-marostegui.json
  • 18:50 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22134 and previous config saved to /var/cache/conftool/dbconfig/20220308-185033-marostegui.json
  • 18:49 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 18:49 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 18:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp5004.eqsin.wmnet
  • 18:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:48 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:48 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 18:47 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 18:47 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 18:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1085.eqiad.wmnet
  • 18:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:35 bblack: cp10[3579] - restarting varnish-fe
  • 18:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS buster
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1012.eqiad.wmnet with OS stretch
  • 18:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 18:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 17:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
  • 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1012.eqiad.wmnet with OS stretch
  • 17:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22133 and previous config saved to /var/cache/conftool/dbconfig/20220308-174838-marostegui.json
  • 17:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22132 and previous config saved to /var/cache/conftool/dbconfig/20220308-173302-marostegui.json
  • 17:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22131 and previous config saved to /var/cache/conftool/dbconfig/20220308-171728-marostegui.json
  • 17:07 jbond: deploy minor clean up of puppetmaster classes gerrit:769072
  • 17:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22130 and previous config saved to /var/cache/conftool/dbconfig/20220308-170153-marostegui.json
  • 17:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:58 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22129 and previous config saved to /var/cache/conftool/dbconfig/20220308-165843-marostegui.json
  • 16:58 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:57 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:54 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:54 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22128 and previous config saved to /var/cache/conftool/dbconfig/20220308-165436-marostegui.json
  • 16:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:54 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 16:53 inflatador: bking@deneb manually installed tox for T293862 . moritzm will add puppet patch for this
  • 16:53 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 16:53 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:53 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:39 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22127 and previous config saved to /var/cache/conftool/dbconfig/20220308-163901-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22126 and previous config saved to /var/cache/conftool/dbconfig/20220308-163835-root.json
  • 16:34 rzl: rzl@apt1001:~$ sudo -i reprepro -C main includedeb buster-wikimedia /home/rzl/envoyproxy_1.18.3-1_amd64.deb # reimporting from component/envoy-future into main, for T300324
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22125 and previous config saved to /var/cache/conftool/dbconfig/20220308-162331-root.json
  • 16:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22124 and previous config saved to /var/cache/conftool/dbconfig/20220308-162326-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22123 and previous config saved to /var/cache/conftool/dbconfig/20220308-160815-root.json
  • 16:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22122 and previous config saved to /var/cache/conftool/dbconfig/20220308-160751-marostegui.json
  • 16:05 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22121 and previous config saved to /var/cache/conftool/dbconfig/20220308-160542-marostegui.json
  • 16:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 16:04 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 16:04 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22120 and previous config saved to /var/cache/conftool/dbconfig/20220308-160416-marostegui.json
  • 16:02 inflatador: bking@deneb manually installed openjdk-11-jdk for T293862 . moritzm will add puppet patch for this
  • 15:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22119 and previous config saved to /var/cache/conftool/dbconfig/20220308-155312-root.json
  • 15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22118 and previous config saved to /var/cache/conftool/dbconfig/20220308-154841-marostegui.json
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P22117 and previous config saved to /var/cache/conftool/dbconfig/20220308-154507-marostegui.json
  • 15:42 XioNoX: update capirca hosts definitions
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22116 and previous config saved to /var/cache/conftool/dbconfig/20220308-154232-root.json
  • 15:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22115 and previous config saved to /var/cache/conftool/dbconfig/20220308-153306-marostegui.json
  • 15:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22114 and previous config saved to /var/cache/conftool/dbconfig/20220308-151731-marostegui.json
  • 15:14 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22113 and previous config saved to /var/cache/conftool/dbconfig/20220308-151446-marostegui.json
  • 15:14 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22112 and previous config saved to /var/cache/conftool/dbconfig/20220308-151406-marostegui.json
  • 14:58 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22111 and previous config saved to /var/cache/conftool/dbconfig/20220308-145831-marostegui.json
  • 14:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22110 and previous config saved to /var/cache/conftool/dbconfig/20220308-144256-marostegui.json
  • 14:33 urbanecm: UTC afternoon B&C window done
  • 14:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/DiscussionTools/includes/Notifications/DiscussionToolsEventTrait.php: 23939c7: Fix logic for finding the oldest comment in a bundle (T302014) (duration: 00m 50s)
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:27 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22109 and previous config saved to /var/cache/conftool/dbconfig/20220308-142721-marostegui.json
  • 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:24 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22108 and previous config saved to /var/cache/conftool/dbconfig/20220308-142412-marostegui.json
  • 14:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22107 and previous config saved to /var/cache/conftool/dbconfig/20220308-142332-marostegui.json
  • 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22104 and previous config saved to /var/cache/conftool/dbconfig/20220308-140758-marostegui.json
  • 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:06 dcaro@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1003.wikimedia.org
  • 14:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 75465dd: fawiki: Add patrolmarks right to autopatrolled group (T303269) (duration: 00m 49s)
  • 13:56 aqu@deploy1002: Finished deploy [airflow-dags/analytics@d1c8ae0]: Fix wikidata_item_page_link destination table after tests (duration: 00m 07s)
  • 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics@d1c8ae0]: Fix wikidata_item_page_link destination table after tests
  • 13:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22103 and previous config saved to /var/cache/conftool/dbconfig/20220308-135223-marostegui.json
  • 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 13:46 dcaro@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
  • 13:40 aqu@deploy1002: Finished deploy [airflow-dags/analytics@725f528]: Set wikidata/item_page_link/weekly start date in production (duration: 00m 07s)
  • 13:40 aqu@deploy1002: Started deploy [airflow-dags/analytics@725f528]: Set wikidata/item_page_link/weekly start date in production
  • 13:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 13:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22102 and previous config saved to /var/cache/conftool/dbconfig/20220308-133647-marostegui.json
  • 13:34 btullis@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:33 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22101 and previous config saved to /var/cache/conftool/dbconfig/20220308-133335-marostegui.json
  • 13:33 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22100 and previous config saved to /var/cache/conftool/dbconfig/20220308-133255-marostegui.json
  • 13:31 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 13:17 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
  • 13:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 13:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22099 and previous config saved to /var/cache/conftool/dbconfig/20220308-131720-marostegui.json
  • 13:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300775)', diff saved to https://phabricator.wikimedia.org/P22098 and previous config saved to /var/cache/conftool/dbconfig/20220308-131420-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22097 and previous config saved to /var/cache/conftool/dbconfig/20220308-131309-root.json
  • 13:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 13:07 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 13:07 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
  • 13:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22096 and previous config saved to /var/cache/conftool/dbconfig/20220308-130145-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22095 and previous config saved to /var/cache/conftool/dbconfig/20220308-125806-root.json
  • 12:57 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 12:56 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 12:46 btullis@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:46 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22094 and previous config saved to /var/cache/conftool/dbconfig/20220308-124610-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22093 and previous config saved to /var/cache/conftool/dbconfig/20220308-124302-root.json
  • 12:42 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22092 and previous config saved to /var/cache/conftool/dbconfig/20220308-124257-marostegui.json
  • 12:42 btullis@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 12:42 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22091 and previous config saved to /var/cache/conftool/dbconfig/20220308-122752-root.json
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22090 and previous config saved to /var/cache/conftool/dbconfig/20220308-121443-marostegui.json
  • 12:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P22089 and previous config saved to /var/cache/conftool/dbconfig/20220308-115938-marostegui.json
  • 11:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:58 volans: uploaded spicerack_2.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:55 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use namespaced ApiFeatureUsageQueryEngineElastica T302907 (duration: 00m 49s)
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:51 btullis@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 11:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1083.eqiad.wmnet with OS buster
  • 11:48 vgutierrez: pool cp1083 with HAProxy as TLS termination layer - T290005
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P22088 and previous config saved to /var/cache/conftool/dbconfig/20220308-114434-marostegui.json
  • 11:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22086 and previous config saved to /var/cache/conftool/dbconfig/20220308-113424-root.json
  • 11:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300381)', diff saved to https://phabricator.wikimedia.org/P22085 and previous config saved to /var/cache/conftool/dbconfig/20220308-113110-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22084 and previous config saved to /var/cache/conftool/dbconfig/20220308-113102-marostegui.json
  • 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22083 and previous config saved to /var/cache/conftool/dbconfig/20220308-112929-marostegui.json
  • 11:29 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22082 and previous config saved to /var/cache/conftool/dbconfig/20220308-112811-marostegui.json
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22081 and previous config saved to /var/cache/conftool/dbconfig/20220308-112804-marostegui.json
  • 11:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
  • 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
  • 11:20 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22080 and previous config saved to /var/cache/conftool/dbconfig/20220308-111920-root.json
  • 11:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
  • 11:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22079 and previous config saved to /var/cache/conftool/dbconfig/20220308-111558-marostegui.json
  • 11:15 XioNoX: Cleanup transport-in filters for codfw/eqiad (CR747551)
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P22078 and previous config saved to /var/cache/conftool/dbconfig/20220308-111259-marostegui.json
  • 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
  • 11:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS buster
  • 11:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1083.eqiad.wmnet with OS buster
  • 11:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:08 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 11:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
  • 11:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:05 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22077 and previous config saved to /var/cache/conftool/dbconfig/20220308-110416-root.json
  • 11:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 11:03 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:02 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22076 and previous config saved to /var/cache/conftool/dbconfig/20220308-110053-marostegui.json
  • 10:59 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 10:59 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 10:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P22075 and previous config saved to /var/cache/conftool/dbconfig/20220308-105754-marostegui.json
  • 10:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 10:54 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
  • 10:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 10:52 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 10:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 10:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 10:51 btullis: btullis@datahubsearch1001:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22074 and previous config saved to /var/cache/conftool/dbconfig/20220308-104913-root.json
  • 10:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 10:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS buster
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22073 and previous config saved to /var/cache/conftool/dbconfig/20220308-104548-marostegui.json
  • 10:45 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 10:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22072 and previous config saved to /var/cache/conftool/dbconfig/20220308-104250-marostegui.json
  • 10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS buster
  • 10:39 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 10:36 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 10:34 vgutierrez: pool cp2035 with HAProxy as TLS termination layer - T290005
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22070 and previous config saved to /var/cache/conftool/dbconfig/20220308-103409-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22069 and previous config saved to /var/cache/conftool/dbconfig/20220308-103251-marostegui.json
  • 10:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22068 and previous config saved to /var/cache/conftool/dbconfig/20220308-103243-marostegui.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22067 and previous config saved to /var/cache/conftool/dbconfig/20220308-103017-marostegui.json
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 10:27 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
  • 10:27 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 10:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 10:22 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P22066 and previous config saved to /var/cache/conftool/dbconfig/20220308-101739-marostegui.json
  • 10:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 10:12 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
  • 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22065 and previous config saved to /var/cache/conftool/dbconfig/20220308-100559-marostegui.json
  • 10:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P22064 and previous config saved to /var/cache/conftool/dbconfig/20220308-100234-marostegui.json
  • 09:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS buster
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22063 and previous config saved to /var/cache/conftool/dbconfig/20220308-095055-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22062 and previous config saved to /var/cache/conftool/dbconfig/20220308-094730-marostegui.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22061 and previous config saved to /var/cache/conftool/dbconfig/20220308-094613-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22060 and previous config saved to /var/cache/conftool/dbconfig/20220308-094605-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300775)', diff saved to https://phabricator.wikimedia.org/P22059 and previous config saved to /var/cache/conftool/dbconfig/20220308-094354-marostegui.json
  • 09:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22058 and previous config saved to /var/cache/conftool/dbconfig/20220308-094155-root.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22057 and previous config saved to /var/cache/conftool/dbconfig/20220308-093550-marostegui.json
  • 09:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2022.codfw.wmnet
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P22056 and previous config saved to /var/cache/conftool/dbconfig/20220308-093101-marostegui.json
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2022.codfw.wmnet
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22055 and previous config saved to /var/cache/conftool/dbconfig/20220308-092651-root.json
  • 09:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2021.codfw.wmnet
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22054 and previous config saved to /var/cache/conftool/dbconfig/20220308-092045-marostegui.json
  • 09:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2021.codfw.wmnet
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P22053 and previous config saved to /var/cache/conftool/dbconfig/20220308-091556-marostegui.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22052 and previous config saved to /var/cache/conftool/dbconfig/20220308-091147-root.json
  • 09:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2020.codfw.wmnet
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22051 and previous config saved to /var/cache/conftool/dbconfig/20220308-090531-marostegui.json
  • 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2020.codfw.wmnet
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2019.codfw.wmnet
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22050 and previous config saved to /var/cache/conftool/dbconfig/20220308-090051-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22049 and previous config saved to /var/cache/conftool/dbconfig/20220308-085934-marostegui.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22048 and previous config saved to /var/cache/conftool/dbconfig/20220308-085921-marostegui.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22047 and previous config saved to /var/cache/conftool/dbconfig/20220308-085644-root.json
  • 08:54 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2019.codfw.wmnet
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P22046 and previous config saved to /var/cache/conftool/dbconfig/20220308-084416-marostegui.json
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22045 and previous config saved to /var/cache/conftool/dbconfig/20220308-084148-marostegui.json
  • 08:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2018.codfw.wmnet
  • 08:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2018.codfw.wmnet
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P22044 and previous config saved to /var/cache/conftool/dbconfig/20220308-082912-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22043 and previous config saved to /var/cache/conftool/dbconfig/20220308-082643-marostegui.json
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:14 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Add image experiment for fa/fr/pt/trwiki (T302828) (duration: 00m 49s)
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22042 and previous config saved to /var/cache/conftool/dbconfig/20220308-081407-marostegui.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22041 and previous config saved to /var/cache/conftool/dbconfig/20220308-081138-marostegui.json
  • 08:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1004.eqiad.wmnet
  • 08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1004.eqiad.wmnet
  • 08:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22040 and previous config saved to /var/cache/conftool/dbconfig/20220308-075634-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22039 and previous config saved to /var/cache/conftool/dbconfig/20220308-075345-marostegui.json
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22038 and previous config saved to /var/cache/conftool/dbconfig/20220308-075338-marostegui.json
  • 07:53 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22037 and previous config saved to /var/cache/conftool/dbconfig/20220308-074136-marostegui.json
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P22036 and previous config saved to /var/cache/conftool/dbconfig/20220308-073833-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P22035 and previous config saved to /var/cache/conftool/dbconfig/20220308-072329-marostegui.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22034 and previous config saved to /var/cache/conftool/dbconfig/20220308-071724-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22033 and previous config saved to /var/cache/conftool/dbconfig/20220308-070824-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22032 and previous config saved to /var/cache/conftool/dbconfig/20220308-070728-marostegui.json
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22031 and previous config saved to /var/cache/conftool/dbconfig/20220308-070721-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22030 and previous config saved to /var/cache/conftool/dbconfig/20220308-070219-marostegui.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P22029 and previous config saved to /var/cache/conftool/dbconfig/20220308-065216-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22028 and previous config saved to /var/cache/conftool/dbconfig/20220308-064714-marostegui.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P22027 and previous config saved to /var/cache/conftool/dbconfig/20220308-063711-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22026 and previous config saved to /var/cache/conftool/dbconfig/20220308-063210-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22025 and previous config saved to /var/cache/conftool/dbconfig/20220308-062206-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22024 and previous config saved to /var/cache/conftool/dbconfig/20220308-062100-marostegui.json
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300775)', diff saved to https://phabricator.wikimedia.org/P22023 and previous config saved to /var/cache/conftool/dbconfig/20220308-061842-marostegui.json
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22022 and previous config saved to /var/cache/conftool/dbconfig/20220308-061700-marostegui.json
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22021 and previous config saved to /var/cache/conftool/dbconfig/20220308-061609-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22020 and previous config saved to /var/cache/conftool/dbconfig/20220308-060106-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22019 and previous config saved to /var/cache/conftool/dbconfig/20220308-054602-root.json
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:57 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 04s)
  • 01:57 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
  • 01:32 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 08s)
  • 01:31 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
  • 01:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@21af07c]: (no justification provided) (duration: 00m 07s)
  • 01:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@21af07c]: (no justification provided)
  • 01:11 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 04s)
  • 01:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
  • 01:07 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 08s)
  • 01:07 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
  • 00:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c8a753b]: (no justification provided) (duration: 00m 07s)
  • 00:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c8a753b]: (no justification provided)
  • 00:08 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b5f7840]: (no justification provided) (duration: 00m 08s)
  • 00:08 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b5f7840]: (no justification provided)

2022-03-07

  • 23:50 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx2001.wikimedia.org with reason: reboot
  • 23:50 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx2001.wikimedia.org with reason: reboot
  • 23:49 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx1001.wikimedia.org with reason: reboot
  • 23:49 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx1001.wikimedia.org with reason: reboot
  • 23:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
  • 23:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
  • 22:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 21:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 21:38 urbanecm: UTC late B&C window done
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: eac551c: Fix language alert regression (T302018) (duration: 00m 50s)
  • 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:22 eileen: config aa7dcd88 -> 16fa8e1c
  • 20:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
  • 20:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 20:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 19:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22016 and previous config saved to /var/cache/conftool/dbconfig/20220307-181310-marostegui.json
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22015 and previous config saved to /var/cache/conftool/dbconfig/20220307-175805-marostegui.json
  • 17:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
  • 17:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
  • 17:47 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestage2002.codfw.wmnet
  • 17:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
  • 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22014 and previous config saved to /var/cache/conftool/dbconfig/20220307-174300-marostegui.json
  • 17:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
  • 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1004.wikimedia.org with OS bullseye
  • 17:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1022.eqiad.wmnet
  • 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22013 and previous config saved to /var/cache/conftool/dbconfig/20220307-172755-marostegui.json
  • 17:24 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1022.eqiad.wmnet
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22012 and previous config saved to /var/cache/conftool/dbconfig/20220307-172134-marostegui.json
  • 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22011 and previous config saved to /var/cache/conftool/dbconfig/20220307-172126-marostegui.json
  • 17:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
  • 17:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
  • 17:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
  • 17:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS buster
  • 17:07 vgutierrez: pool cp3058 with HAProxy as TLS termination layer - T290005
  • 17:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22010 and previous config saved to /var/cache/conftool/dbconfig/20220307-170622-marostegui.json
  • 17:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 16:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 16:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1004.wikimedia.org with OS bullseye
  • 16:52 vgutierrez: depool cp5004 - T303043
  • 16:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22009 and previous config saved to /var/cache/conftool/dbconfig/20220307-165117-marostegui.json
  • 16:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 16:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 16:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 16:45 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 16:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 16:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 16:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 16:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5010.eqsin.wmnet with OS buster
  • 16:41 vgutierrez: pool cp5010 with HAProxy as TLS termination layer - T290005
  • 16:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22008 and previous config saved to /var/cache/conftool/dbconfig/20220307-163612-marostegui.json
  • 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 16:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
  • 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22007 and previous config saved to /var/cache/conftool/dbconfig/20220307-162821-marostegui.json
  • 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 16:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
  • 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 16:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22006 and previous config saved to /var/cache/conftool/dbconfig/20220307-162157-marostegui.json
  • 16:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
  • 16:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 16:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS buster
  • 16:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
  • 16:17 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 16:16 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 16:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
  • 16:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
  • 16:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 16:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
  • 16:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22005 and previous config saved to /var/cache/conftool/dbconfig/20220307-160650-marostegui.json
  • 16:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
  • 16:04 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 16:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 16:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 16:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 15:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 15:58 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 15:56 jayme: eqiad: kubectl -n istio-system delete po istiod-69d679d8b5-hm64j - T303184
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22004 and previous config saved to /var/cache/conftool/dbconfig/20220307-155146-marostegui.json
  • 15:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5010.eqsin.wmnet with OS buster
  • 15:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
  • 15:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
  • 15:38 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1085.eqiad.wmnet with OS buster
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22003 and previous config saved to /var/cache/conftool/dbconfig/20220307-153641-marostegui.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22002 and previous config saved to /var/cache/conftool/dbconfig/20220307-153357-marostegui.json
  • 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P22001 and previous config saved to /var/cache/conftool/dbconfig/20220307-153343-marostegui.json
  • 15:20 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@7642d65]: (no justification provided) (duration: 00m 07s)
  • 15:20 ntsako@deploy1002: Started deploy [airflow-dags/analytics@7642d65]: (no justification provided)
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22000 and previous config saved to /var/cache/conftool/dbconfig/20220307-151929-root.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21999 and previous config saved to /var/cache/conftool/dbconfig/20220307-151839-marostegui.json
  • 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:09 ntsako@deploy1002: Finished deploy [airflow-dags/analytics_test@7642d65]: (no justification provided) (duration: 00m 09s)
  • 15:09 ntsako@deploy1002: Started deploy [airflow-dags/analytics_test@7642d65]: (no justification provided)
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21998 and previous config saved to /var/cache/conftool/dbconfig/20220307-150426-root.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:03 vgutierrez: pool cp4030 with HAProxy as TLS termination layer - T290005
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21997 and previous config saved to /var/cache/conftool/dbconfig/20220307-150334-marostegui.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 15:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4030.ulsfo.wmnet with OS buster
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 14:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 14:56 vgutierrez: depool cp1085
  • 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21996 and previous config saved to /var/cache/conftool/dbconfig/20220307-144922-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21995 and previous config saved to /var/cache/conftool/dbconfig/20220307-144829-marostegui.json
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:45 vgutierrez: pool cp1085 with HAProxy as TLS termination layer - T290005
  • 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 14:37 urbanecm@deploy1002: Synchronized static/images/project-logos/: f50c474: Revert "Change temporary logo for slwiki" (T302661; 2/2) (duration: 00m 48s)
  • 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:36 urbanecm@deploy1002: Synchronized wmf-config/logos.php: f50c474: Revert "Change temporary logo for slwiki" (T302661; 1/2) (duration: 00m 49s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:35 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: (no justification provided) (duration: 00m 04s)
  • 14:35 ntsako@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: (no justification provided)
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
  • 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21994 and previous config saved to /var/cache/conftool/dbconfig/20220307-143419-root.json
  • 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21993 and previous config saved to /var/cache/conftool/dbconfig/20220307-143229-ladsgroup.json
  • 14:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
  • 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 14:30 moritzm: rebooting etherpad1003 (running etherpad1003) for kernel update
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
  • 14:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21992 and previous config saved to /var/cache/conftool/dbconfig/20220307-141915-root.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21991 and previous config saved to /var/cache/conftool/dbconfig/20220307-141911-root.json
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 64b1284: Enable reply tool by default on enwiki (T296645) (duration: 00m 49s)
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21990 and previous config saved to /var/cache/conftool/dbconfig/20220307-141724-ladsgroup.json
  • 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8f20ec9: fawiki: Disable creating community books and remove "Create a book" link from sidebar (T303173) (duration: 00m 49s)
  • 14:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4030.ulsfo.wmnet with OS buster
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
  • 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
  • 14:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS buster
  • 14:08 urbanecm@deploy1002: Synchronized logos/config.yaml: 8619f59: etwikiquote: Update logo (T302683; 3/3) (duration: 00m 49s)
  • 14:07 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 8619f59: etwikiquote: Update logo (T302683; 2/3) (duration: 00m 49s)
  • 14:07 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/etwikiquote.png (T302683)
  • 14:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 8619f59: etwikiquote: Update logo (T302683; 1/3) (duration: 00m 50s)
  • 14:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
  • 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21989 and previous config saved to /var/cache/conftool/dbconfig/20220307-140408-root.json
  • 14:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS buster
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21988 and previous config saved to /var/cache/conftool/dbconfig/20220307-140219-ladsgroup.json
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 14:00 kormat: removing cumin2001 grants from all db sections T276589
  • 14:00 vgutierrez: pool cp2037 with HAProxy as TLS termination layer - T290005
  • 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21987 and previous config saved to /var/cache/conftool/dbconfig/20220307-135614-ladsgroup.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21986 and previous config saved to /var/cache/conftool/dbconfig/20220307-134904-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21985 and previous config saved to /var/cache/conftool/dbconfig/20220307-134848-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21984 and previous config saved to /var/cache/conftool/dbconfig/20220307-134840-marostegui.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21983 and previous config saved to /var/cache/conftool/dbconfig/20220307-134715-ladsgroup.json
  • 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 07m 17s)
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21982 and previous config saved to /var/cache/conftool/dbconfig/20220307-134109-ladsgroup.json
  • 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:39 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 00m 08s)
  • 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:37 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 25m 04s)
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21981 and previous config saved to /var/cache/conftool/dbconfig/20220307-133400-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21980 and previous config saved to /var/cache/conftool/dbconfig/20220307-133335-marostegui.json
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21979 and previous config saved to /var/cache/conftool/dbconfig/20220307-132605-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1142.eqiad.wmnet with OS bullseye
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21978 and previous config saved to /var/cache/conftool/dbconfig/20220307-131857-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21977 and previous config saved to /var/cache/conftool/dbconfig/20220307-131830-marostegui.json
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6)', diff saved to https://phabricator.wikimedia.org/P21976 and previous config saved to /var/cache/conftool/dbconfig/20220307-131606-marostegui.json
  • 13:12 aqu@deploy1002: Started deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21975 and previous config saved to /var/cache/conftool/dbconfig/20220307-131100-ladsgroup.json
  • 13:09 aqu_: About to deploy analytics/refinery - Migrate wikidata/item_page_link/weekly from Oozie to Airflow
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21974 and previous config saved to /var/cache/conftool/dbconfig/20220307-130520-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21973 and previous config saved to /var/cache/conftool/dbconfig/20220307-130512-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21972 and previous config saved to /var/cache/conftool/dbconfig/20220307-130326-marostegui.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21971 and previous config saved to /var/cache/conftool/dbconfig/20220307-125540-marostegui.json
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21970 and previous config saved to /var/cache/conftool/dbconfig/20220307-125532-marostegui.json
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1142.eqiad.wmnet with OS bullseye
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21969 and previous config saved to /var/cache/conftool/dbconfig/20220307-125007-ladsgroup.json
  • 12:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly (duration: 00m 07s)
  • 12:49 aqu@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21968 and previous config saved to /var/cache/conftool/dbconfig/20220307-124815-ladsgroup.json
  • 12:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21967 and previous config saved to /var/cache/conftool/dbconfig/20220307-124028-marostegui.json
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 12:37 XioNoX: restart cr1-drmrs for software upgrade
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21966 and previous config saved to /var/cache/conftool/dbconfig/20220307-123503-ladsgroup.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21965 and previous config saved to /var/cache/conftool/dbconfig/20220307-122523-marostegui.json
  • 12:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS buster
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21964 and previous config saved to /var/cache/conftool/dbconfig/20220307-121958-ladsgroup.json
  • 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS buster
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21963 and previous config saved to /var/cache/conftool/dbconfig/20220307-121443-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:13 vgutierrez: pool cp3060 with HAProxy as TLS termination layer - T290005
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300775)', diff saved to https://phabricator.wikimedia.org/P21962 and previous config saved to /var/cache/conftool/dbconfig/20220307-121122-marostegui.json
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21961 and previous config saved to /var/cache/conftool/dbconfig/20220307-121018-marostegui.json
  • 12:10 XioNoX: reboot cr2-drmrs for software upgrade
  • 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21960 and previous config saved to /var/cache/conftool/dbconfig/20220307-120821-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21959 and previous config saved to /var/cache/conftool/dbconfig/20220307-120722-ladsgroup.json
  • 12:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21958 and previous config saved to /var/cache/conftool/dbconfig/20220307-120532-marostegui.json
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5016.eqsin.wmnet with OS buster
  • 12:03 vgutierrez: pool cp5016 with HAProxy as TLS termination layer - T290005
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 11:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21957 and previous config saved to /var/cache/conftool/dbconfig/20220307-115337-marostegui.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21956 and previous config saved to /var/cache/conftool/dbconfig/20220307-115316-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21955 and previous config saved to /var/cache/conftool/dbconfig/20220307-115217-ladsgroup.json
  • 11:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 11:45 XioNoX: remove MTU1400 on drmrs GTT links
  • 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21954 and previous config saved to /var/cache/conftool/dbconfig/20220307-113833-marostegui.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21953 and previous config saved to /var/cache/conftool/dbconfig/20220307-113811-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21952 and previous config saved to /var/cache/conftool/dbconfig/20220307-113712-ladsgroup.json
  • 11:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 11:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 11:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21951 and previous config saved to /var/cache/conftool/dbconfig/20220307-112328-marostegui.json
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21950 and previous config saved to /var/cache/conftool/dbconfig/20220307-112307-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21949 and previous config saved to /var/cache/conftool/dbconfig/20220307-112207-ladsgroup.json
  • 11:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS buster
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21948 and previous config saved to /var/cache/conftool/dbconfig/20220307-111834-root.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21947 and previous config saved to /var/cache/conftool/dbconfig/20220307-111816-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21946 and previous config saved to /var/cache/conftool/dbconfig/20220307-111809-ladsgroup.json
  • 11:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
  • 11:12 vgutierrez: pool cp4036 with HAProxy as TLS termination layer - T290005
  • 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5016.eqsin.wmnet with OS buster
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21945 and previous config saved to /var/cache/conftool/dbconfig/20220307-110823-marostegui.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21944 and previous config saved to /var/cache/conftool/dbconfig/20220307-110330-root.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21943 and previous config saved to /var/cache/conftool/dbconfig/20220307-110304-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1143.eqiad.wmnet with OS bullseye
  • 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS buster
  • 10:59 vgutierrez: pool cp1084 with HAProxy as TLS termination layer - T290005
  • 10:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
  • 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21942 and previous config saved to /var/cache/conftool/dbconfig/20220307-104906-marostegui.json
  • 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21941 and previous config saved to /var/cache/conftool/dbconfig/20220307-104826-root.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21940 and previous config saved to /var/cache/conftool/dbconfig/20220307-104759-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
  • 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 10:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
  • 10:34 jayme: (re)started ferm on kubernetes1001
  • 10:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21939 and previous config saved to /var/cache/conftool/dbconfig/20220307-103323-root.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21938 and previous config saved to /var/cache/conftool/dbconfig/20220307-103253-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1143.eqiad.wmnet with OS bullseye
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21937 and previous config saved to /var/cache/conftool/dbconfig/20220307-102737-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21936 and previous config saved to /var/cache/conftool/dbconfig/20220307-102730-ladsgroup.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P21935 and previous config saved to /var/cache/conftool/dbconfig/20220307-102209-marostegui.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21934 and previous config saved to /var/cache/conftool/dbconfig/20220307-102158-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21933 and previous config saved to /var/cache/conftool/dbconfig/20220307-102129-ladsgroup.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21932 and previous config saved to /var/cache/conftool/dbconfig/20220307-102054-marostegui.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P21931 and previous config saved to /var/cache/conftool/dbconfig/20220307-101824-marostegui.json
  • 10:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS buster
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21930 and previous config saved to /var/cache/conftool/dbconfig/20220307-101657-root.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21929 and previous config saved to /var/cache/conftool/dbconfig/20220307-101225-ladsgroup.json
  • 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS buster
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21928 and previous config saved to /var/cache/conftool/dbconfig/20220307-100624-ladsgroup.json
  • 10:04 vgutierrez: pool cp2036 with HAProxy as TLS termination layer - T290005
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21927 and previous config saved to /var/cache/conftool/dbconfig/20220307-100153-root.json
  • 10:00 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21926 and previous config saved to /var/cache/conftool/dbconfig/20220307-095720-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21925 and previous config saved to /var/cache/conftool/dbconfig/20220307-095120-ladsgroup.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21924 and previous config saved to /var/cache/conftool/dbconfig/20220307-095111-root.json
  • 09:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21923 and previous config saved to /var/cache/conftool/dbconfig/20220307-094649-root.json
  • 09:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21922 and previous config saved to /var/cache/conftool/dbconfig/20220307-094216-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21921 and previous config saved to /var/cache/conftool/dbconfig/20220307-093701-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21920 and previous config saved to /var/cache/conftool/dbconfig/20220307-093653-ladsgroup.json
  • 09:36 jynus: updated non-A wikipedia.org DNS records T302617
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21919 and previous config saved to /var/cache/conftool/dbconfig/20220307-093615-ladsgroup.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21918 and previous config saved to /var/cache/conftool/dbconfig/20220307-093607-root.json
  • 09:35 jynus: updated non-A wikipedia.org DNS records
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21917 and previous config saved to /var/cache/conftool/dbconfig/20220307-093146-root.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21916 and previous config saved to /var/cache/conftool/dbconfig/20220307-093032-ladsgroup.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P21915 and previous config saved to /var/cache/conftool/dbconfig/20220307-093013-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21914 and previous config saved to /var/cache/conftool/dbconfig/20220307-092924-root.json
  • 09:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS buster
  • 09:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 04s)
  • 09:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21913 and previous config saved to /var/cache/conftool/dbconfig/20220307-092148-ladsgroup.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 60%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21912 and previous config saved to /var/cache/conftool/dbconfig/20220307-092103-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21911 and previous config saved to /var/cache/conftool/dbconfig/20220307-092034-marostegui.json
  • 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21910 and previous config saved to /var/cache/conftool/dbconfig/20220307-091527-ladsgroup.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21909 and previous config saved to /var/cache/conftool/dbconfig/20220307-091421-root.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21908 and previous config saved to /var/cache/conftool/dbconfig/20220307-090644-ladsgroup.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21907 and previous config saved to /var/cache/conftool/dbconfig/20220307-090600-root.json
  • 09:01 dcausse: restarting blazegraph on wdqs1013 (jvm stuck for 6hours)
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21906 and previous config saved to /var/cache/conftool/dbconfig/20220307-090021-ladsgroup.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21905 and previous config saved to /var/cache/conftool/dbconfig/20220307-085917-root.json
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21904 and previous config saved to /var/cache/conftool/dbconfig/20220307-085139-ladsgroup.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 40%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21903 and previous config saved to /var/cache/conftool/dbconfig/20220307-085056-root.json
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:46 elukey: `kafka configs --alter --entity-type topics --entity-name udp_localhost-info --add-config retention.bytes=300000000000` on kafka-logging to reduce the size of the biggest topic partitions
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21902 and previous config saved to /var/cache/conftool/dbconfig/20220307-084641-ladsgroup.json
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21901 and previous config saved to /var/cache/conftool/dbconfig/20220307-084516-ladsgroup.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21900 and previous config saved to /var/cache/conftool/dbconfig/20220307-084413-root.json
  • 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e3f70f6: enwiki: Deploy Growth features to 100% of users (T302846) (duration: 00m 50s)
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P21899 and previous config saved to /var/cache/conftool/dbconfig/20220307-084235-marostegui.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21898 and previous config saved to /var/cache/conftool/dbconfig/20220307-084219-root.json
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21897 and previous config saved to /var/cache/conftool/dbconfig/20220307-083948-ladsgroup.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21896 and previous config saved to /var/cache/conftool/dbconfig/20220307-083553-root.json
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21895 and previous config saved to /var/cache/conftool/dbconfig/20220307-082716-root.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21894 and previous config saved to /var/cache/conftool/dbconfig/20220307-082443-ladsgroup.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 20%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21893 and previous config saved to /var/cache/conftool/dbconfig/20220307-082049-root.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21892 and previous config saved to /var/cache/conftool/dbconfig/20220307-081212-root.json
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21891 and previous config saved to /var/cache/conftool/dbconfig/20220307-080938-ladsgroup.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21890 and previous config saved to /var/cache/conftool/dbconfig/20220307-080545-root.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1144.eqiad.wmnet with OS bullseye
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21889 and previous config saved to /var/cache/conftool/dbconfig/20220307-075708-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P21888 and previous config saved to /var/cache/conftool/dbconfig/20220307-075523-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21887 and previous config saved to /var/cache/conftool/dbconfig/20220307-075504-root.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21886 and previous config saved to /var/cache/conftool/dbconfig/20220307-075433-ladsgroup.json
  • 07:53 marostegui: dbmaint on db1181 s7@eqiad T276150
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P21885 and previous config saved to /var/cache/conftool/dbconfig/20220307-075120-marostegui.json
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21884 and previous config saved to /var/cache/conftool/dbconfig/20220307-074923-ladsgroup.json
  • 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21883 and previous config saved to /var/cache/conftool/dbconfig/20220307-074909-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
  • 07:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21882 and previous config saved to /var/cache/conftool/dbconfig/20220307-074001-root.json
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21881 and previous config saved to /var/cache/conftool/dbconfig/20220307-073405-ladsgroup.json
  • 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1144.eqiad.wmnet with OS bullseye
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21880 and previous config saved to /var/cache/conftool/dbconfig/20220307-072624-ladsgroup.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21879 and previous config saved to /var/cache/conftool/dbconfig/20220307-072457-root.json
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21878 and previous config saved to /var/cache/conftool/dbconfig/20220307-072453-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21877 and previous config saved to /var/cache/conftool/dbconfig/20220307-071900-ladsgroup.json
  • 07:15 elukey: `elukey@ml-staging-ctrl2002:~$ sudo systemctl reset-failed ifup@ens13.service`
  • 07:14 elukey: kill tmux sessions of user 'zpapierski' on wdqs[1004,2002,2003] (puppet broken, offboarded user)
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21876 and previous config saved to /var/cache/conftool/dbconfig/20220307-071227-ladsgroup.json
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21875 and previous config saved to /var/cache/conftool/dbconfig/20220307-070953-root.json
  • 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:06 marostegui: dbmaint on db1179 s3@eqiad T302222
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P21874 and previous config saved to /var/cache/conftool/dbconfig/20220307-070537-marostegui.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21873 and previous config saved to /var/cache/conftool/dbconfig/20220307-070355-ladsgroup.json
  • 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21872 and previous config saved to /var/cache/conftool/dbconfig/20220307-065839-ladsgroup.json
  • 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21871 and previous config saved to /var/cache/conftool/dbconfig/20220307-065832-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21870 and previous config saved to /var/cache/conftool/dbconfig/20220307-065722-ladsgroup.json
  • 06:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:52 urbanecm: Reset authentication throttle for 217.23.37.10 via resetAuthenticationThrottle.php (T302973)
  • 06:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 2e9fdd4: 867bb7b: Add throttle rules (T302973; T303002) (duration: 00m 49s)
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21869 and previous config saved to /var/cache/conftool/dbconfig/20220307-064327-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21868 and previous config saved to /var/cache/conftool/dbconfig/20220307-064217-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21867 and previous config saved to /var/cache/conftool/dbconfig/20220307-062823-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21866 and previous config saved to /var/cache/conftool/dbconfig/20220307-062713-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21865 and previous config saved to /var/cache/conftool/dbconfig/20220307-061318-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21864 and previous config saved to /var/cache/conftool/dbconfig/20220307-060819-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21863 and previous config saved to /var/cache/conftool/dbconfig/20220307-060811-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21862 and previous config saved to /var/cache/conftool/dbconfig/20220307-055307-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1147.eqiad.wmnet with OS bullseye
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21861 and previous config saved to /var/cache/conftool/dbconfig/20220307-053802-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
  • 05:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21860 and previous config saved to /var/cache/conftool/dbconfig/20220307-052257-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1147.eqiad.wmnet with OS bullseye
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21859 and previous config saved to /var/cache/conftool/dbconfig/20220307-051807-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21858 and previous config saved to /var/cache/conftool/dbconfig/20220307-051537-ladsgroup.json
  • 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance

2022-03-04

  • 17:59 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 17:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 mforns@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 07s)
  • 17:46 mforns@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
  • 17:39 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@19520c1]: (no justification provided) (duration: 00m 08s)
  • 17:39 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@19520c1]: (no justification provided)
  • 17:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 08s)
  • 17:09 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
  • 16:35 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 10s)
  • 16:13 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21856 and previous config saved to /var/cache/conftool/dbconfig/20220304-160629-ladsgroup.json
  • 16:03 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 03s)
  • 16:03 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 15:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS buster
  • 15:58 vgutierrez: pool cp1086 with HAProxy as TLS termination layer - T290005
  • 15:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS buster
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21854 and previous config saved to /var/cache/conftool/dbconfig/20220304-155124-ladsgroup.json
  • 15:51 vgutierrez: pool cp2038 with HAProxy as TLS termination layer - T290005
  • 15:49 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
  • 15:49 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 15:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 15:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 15:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21852 and previous config saved to /var/cache/conftool/dbconfig/20220304-153619-ladsgroup.json
  • 15:34 XioNoX: blackhole IPs - T303055
  • 15:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 15:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS buster
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21851 and previous config saved to /var/cache/conftool/dbconfig/20220304-152114-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21850 and previous config saved to /var/cache/conftool/dbconfig/20220304-152007-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21849 and previous config saved to /var/cache/conftool/dbconfig/20220304-151937-ladsgroup.json
  • 15:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS buster
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21848 and previous config saved to /var/cache/conftool/dbconfig/20220304-150433-ladsgroup.json
  • 14:59 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad.service on elastic1049 to resolve CirrusSearchJVMGCOldPoolFlatlined alert
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21847 and previous config saved to /var/cache/conftool/dbconfig/20220304-144926-ladsgroup.json
  • 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS buster
  • 14:43 vgutierrez: pool cp3059 with HAProxy as TLS termination layer - T290005
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21846 and previous config saved to /var/cache/conftool/dbconfig/20220304-143421-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21845 and previous config saved to /var/cache/conftool/dbconfig/20220304-143214-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21844 and previous config saved to /var/cache/conftool/dbconfig/20220304-143206-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21842 and previous config saved to /var/cache/conftool/dbconfig/20220304-141701-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21841 and previous config saved to /var/cache/conftool/dbconfig/20220304-140156-ladsgroup.json
  • 13:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1302-1306].eqiad.wmnet
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21840 and previous config saved to /var/cache/conftool/dbconfig/20220304-134651-ladsgroup.json
  • 13:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21839 and previous config saved to /var/cache/conftool/dbconfig/20220304-134443-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21838 and previous config saved to /var/cache/conftool/dbconfig/20220304-134436-ladsgroup.json
  • 13:38 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21837 and previous config saved to /var/cache/conftool/dbconfig/20220304-132931-ladsgroup.json
  • 13:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1302-1306].eqiad.wmnet
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21836 and previous config saved to /var/cache/conftool/dbconfig/20220304-131426-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21835 and previous config saved to /var/cache/conftool/dbconfig/20220304-125921-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21834 and previous config saved to /var/cache/conftool/dbconfig/20220304-125714-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21833 and previous config saved to /var/cache/conftool/dbconfig/20220304-125706-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21832 and previous config saved to /var/cache/conftool/dbconfig/20220304-124201-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21831 and previous config saved to /var/cache/conftool/dbconfig/20220304-122656-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21830 and previous config saved to /var/cache/conftool/dbconfig/20220304-121152-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21829 and previous config saved to /var/cache/conftool/dbconfig/20220304-120944-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21828 and previous config saved to /var/cache/conftool/dbconfig/20220304-120937-ladsgroup.json
  • 12:04 jbond: enable SameSite=Strict on idp
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21827 and previous config saved to /var/cache/conftool/dbconfig/20220304-115432-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21826 and previous config saved to /var/cache/conftool/dbconfig/20220304-113927-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21825 and previous config saved to /var/cache/conftool/dbconfig/20220304-112422-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21824 and previous config saved to /var/cache/conftool/dbconfig/20220304-112214-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21823 and previous config saved to /var/cache/conftool/dbconfig/20220304-112207-ladsgroup.json
  • 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4024.ulsfo.wmnet with OS buster
  • 11:09 vgutierrez: pool cp4024 with HAProxy as TLS termination layer - T290005
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21822 and previous config saved to /var/cache/conftool/dbconfig/20220304-110702-ladsgroup.json
  • 10:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
  • 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21821 and previous config saved to /var/cache/conftool/dbconfig/20220304-105157-ladsgroup.json
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS buster
  • 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4024.ulsfo.wmnet with OS buster
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21820 and previous config saved to /var/cache/conftool/dbconfig/20220304-103652-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21819 and previous config saved to /var/cache/conftool/dbconfig/20220304-103444-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300992)', diff saved to https://phabricator.wikimedia.org/P21818 and previous config saved to /var/cache/conftool/dbconfig/20220304-103437-ladsgroup.json
  • 10:29 vgutierrez: pool cp5004 with HAProxy as TLS termination layer - T290005
  • 10:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5004.eqsin.wmnet with OS buster
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21817 and previous config saved to /var/cache/conftool/dbconfig/20220304-101932-ladsgroup.json
  • 10:08 aqu@deploy1002: Finished deploy [airflow-dags/analytics@1c8384f]: AF //tion default args (duration: 00m 07s)
  • 10:08 aqu@deploy1002: Started deploy [airflow-dags/analytics@1c8384f]: AF //tion default args
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21816 and previous config saved to /var/cache/conftool/dbconfig/20220304-100427-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300992)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20220304-094918-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300992)', diff saved to https://phabricator.wikimedia.org/P21815 and previous config saved to /var/cache/conftool/dbconfig/20220304-094710-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21814 and previous config saved to /var/cache/conftool/dbconfig/20220304-094702-ladsgroup.json
  • 09:43 vgutierrez: restart varnish on cp3056
  • 09:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
  • 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
  • 09:37 vgutierrez: restart varnish on cp3058
  • 09:33 vgutierrez: restart varnish on cp3060
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21813 and previous config saved to /var/cache/conftool/dbconfig/20220304-093157-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21812 and previous config saved to /var/cache/conftool/dbconfig/20220304-091652-ladsgroup.json
  • 09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5004.eqsin.wmnet with OS buster
  • 09:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1005-1006].eqiad.wmnet
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21811 and previous config saved to /var/cache/conftool/dbconfig/20220304-090147-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21810 and previous config saved to /var/cache/conftool/dbconfig/20220304-085939-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21809 and previous config saved to /var/cache/conftool/dbconfig/20220304-085932-ladsgroup.json
  • 08:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21808 and previous config saved to /var/cache/conftool/dbconfig/20220304-084427-ladsgroup.json
  • 08:34 akosiaris: T303027 depool mw130[2-6]. Old jobrunners/videoscalers, being decommisioned
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=mw130[2-6].eqiad.wmnet
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21807 and previous config saved to /var/cache/conftool/dbconfig/20220304-082922-ladsgroup.json
  • 08:23 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1005-1006].eqiad.wmnet
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21806 and previous config saved to /var/cache/conftool/dbconfig/20220304-081417-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21805 and previous config saved to /var/cache/conftool/dbconfig/20220304-081210-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:27 XioNoX: push pfw policies - T303003
  • 01:35 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 01:34 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 01:34 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 01:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 01:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 01:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 01:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 01:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 01:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 01:29 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 01:29 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 01:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 01:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 01:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 01:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 01:23 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 01:23 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 01:22 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply

2022-03-03

  • 21:35 brennen: end of UTC late backport & config window / training
  • 21:30 brennen@deploy1002: Finished scap: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956) (duration: 01m 33s)
  • 21:28 brennen@deploy1002: Started scap: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)
  • 21:28 brennen@deploy1002: Synchronized multiversion/MWRealm.php: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956) (duration: 00m 48s)
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:35 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.24 refs T300200
  • 19:32 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to all wikis
  • 19:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: Backport: Unset data-toc in SkinVector (T302461) (duration: 00m 49s)
  • 19:23 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/MinervaNeue/resources/skins.minerva.base.styles/userMenu.less: Backport: Remove user navigation min width and width (T302753) (duration: 00m 51s)
  • 19:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 18:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 18:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 18:39 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 18:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:29 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:11 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard (duration: 09m 12s)
  • 18:02 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 18:02 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard
  • 17:59 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:58 krinkle@deploy1002: Synchronized wmf-config/: Idf7b21159423 (duration: 00m 51s)
  • 17:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:48 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:47 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21802 and previous config saved to /var/cache/conftool/dbconfig/20220303-173630-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21801 and previous config saved to /var/cache/conftool/dbconfig/20220303-172125-ladsgroup.json
  • 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21800 and previous config saved to /var/cache/conftool/dbconfig/20220303-170621-ladsgroup.json
  • 17:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 16:53 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 16:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21799 and previous config saved to /var/cache/conftool/dbconfig/20220303-165116-ladsgroup.json
  • 16:30 godog: roll-restart logstash to pick up config changes - T291946
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1148.eqiad.wmnet with OS bullseye
  • 16:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
  • 15:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
  • 15:53 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1148.eqiad.wmnet with OS bullseye
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21798 and previous config saved to /var/cache/conftool/dbconfig/20220303-152242-ladsgroup.json
  • 15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:21 moritzm: restarting FPM/Apache on mw job runners to pick up expat security updates
  • 15:08 mutante: T296022 - phabricator - disabled git cloning over ssh for 'stewardscripts' repo - stewards have been asked via mailing list
  • 14:48 godog: force a puppet run on cp6011 to unblock icinga and disable puppet again, cc bblack
  • 14:48 Lucas_WMDE: UTC afternoon backport window done
  • 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport: GLAM event: Update landing page content (T301097) (full sync because of i18n change) (duration: 09m 45s)
  • 14:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport: GLAM event: Update landing page content (T301097) (full sync because of i18n change)
  • 14:26 XioNoX: merge Icinga: use parent switch shortname
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 14:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 14:04 volans: upgraded spicerack to v2.1.0 on cumin1001/cumin2002
  • 14:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21794 and previous config saved to /var/cache/conftool/dbconfig/20220303-135737-ladsgroup.json
  • 13:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:54 akosiaris: switch changeprop, changeprop-jobqueue to use rdb1011. T281217
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:53 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 13:51 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 13:45 akosiaris: roll restart ores uwsgi and celery for rdb1005 decommissioning. T281217
  • 13:44 akosiaris@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21793 and previous config saved to /var/cache/conftool/dbconfig/20220303-134232-ladsgroup.json
  • 13:20 moritzm: restarting FPM/Apache on mw app servers to pick up expat security updates
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21791 and previous config saved to /var/cache/conftool/dbconfig/20220303-131223-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1149.eqiad.wmnet with OS bullseye
  • 12:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
  • 12:47 hashar: Upgrading Quibble on CI Jenkins jobs from 1.3.0 to 1.4.3 https://gerrit.wikimedia.org/r/c/integration/config/+/767749/
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1149.eqiad.wmnet with OS bullseye
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21790 and previous config saved to /var/cache/conftool/dbconfig/20220303-123030-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 11:49 volans: uploaded spicerack_2.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21789 and previous config saved to /var/cache/conftool/dbconfig/20220303-113304-kormat.json
  • 11:18 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21788 and previous config saved to /var/cache/conftool/dbconfig/20220303-111801-kormat.json
  • 11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21787 and previous config saved to /var/cache/conftool/dbconfig/20220303-110257-kormat.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21786 and previous config saved to /var/cache/conftool/dbconfig/20220303-110224-ladsgroup.json
  • 11:02 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1126 to full weight', diff saved to https://phabricator.wikimedia.org/P21785 and previous config saved to /var/cache/conftool/dbconfig/20220303-110220-kormat.json
  • 10:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493) (duration: 00m 50s)
  • 10:50 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.24/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493) (duration: 00m 51s)
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21784 and previous config saved to /var/cache/conftool/dbconfig/20220303-104713-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21783 and previous config saved to /var/cache/conftool/dbconfig/20220303-103659-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21782 and previous config saved to /var/cache/conftool/dbconfig/20220303-103209-ladsgroup.json
  • 10:30 XioNoX: repool ulsfo
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21781 and previous config saved to /var/cache/conftool/dbconfig/20220303-102154-ladsgroup.json
  • 10:18 elukey: kubectl cordon kubernetes200[1-4] to avoid scheduling pods on nodes that will be decommed during the next weeks - T302208
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21780 and previous config saved to /var/cache/conftool/dbconfig/20220303-101704-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1160.eqiad.wmnet with OS bullseye
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21779 and previous config saved to /var/cache/conftool/dbconfig/20220303-100649-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21778 and previous config saved to /var/cache/conftool/dbconfig/20220303-095145-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 09:37 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args (duration: 00m 09s)
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1160.eqiad.wmnet with OS bullseye
  • 09:37 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21777 and previous config saved to /var/cache/conftool/dbconfig/20220303-093306-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2073 (T302950)', diff saved to https://phabricator.wikimedia.org/P21775 and previous config saved to /var/cache/conftool/dbconfig/20220303-091340-ladsgroup.json
  • 09:12 moritzm: restarting FPM/Apache on mw API servers to pick up expat security updates
  • 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2073.codfw.wmnet with OS bullseye
  • 09:01 moritzm: restarting superset on an-tool1010 to pick up expat security updates
  • 08:52 taavi: UTC morning deploys done
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21774 and previous config saved to /var/cache/conftool/dbconfig/20220303-085125-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21773 and previous config saved to /var/cache/conftool/dbconfig/20220303-085118-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
  • 08:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GLAM event: Update wgGECampaigns and wgGECampaignTopics (T301029) (duration: 00m 51s)
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21772 and previous config saved to /var/cache/conftool/dbconfig/20220303-083613-ladsgroup.json
  • 08:34 moritzm: installing expat security updates
  • 08:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2073.codfw.wmnet with OS bullseye
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2073 (T302950)', diff saved to https://phabricator.wikimedia.org/P21771 and previous config saved to /var/cache/conftool/dbconfig/20220303-082842-ladsgroup.json
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2090 (T302950)', diff saved to https://phabricator.wikimedia.org/P21770 and previous config saved to /var/cache/conftool/dbconfig/20220303-082656-ladsgroup.json
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21768 and previous config saved to /var/cache/conftool/dbconfig/20220303-082108-ladsgroup.json
  • 08:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4034.ulsfo.wmnet with OS buster
  • 08:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add centralauth-suppress to steward and wmf-supportsafety at metawiki (T302675) (duration: 00m 50s)
  • 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2090.codfw.wmnet with OS bullseye
  • 08:13 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: fawiki: Remove the Book namespace (T302957) (duration: 00m 51s)
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21767 and previous config saved to /var/cache/conftool/dbconfig/20220303-080603-ladsgroup.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
  • 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
  • 07:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21766 and previous config saved to /var/cache/conftool/dbconfig/20220303-075534-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2090.codfw.wmnet with OS bullseye
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2090 (T302950)', diff saved to https://phabricator.wikimedia.org/P21765 and previous config saved to /var/cache/conftool/dbconfig/20220303-074209-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21764 and previous config saved to /var/cache/conftool/dbconfig/20220303-073920-ladsgroup.json
  • 07:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21763 and previous config saved to /var/cache/conftool/dbconfig/20220303-072415-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T302950)', diff saved to https://phabricator.wikimedia.org/P21762 and previous config saved to /var/cache/conftool/dbconfig/20220303-071800-ladsgroup.json
  • 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2106.codfw.wmnet with OS bullseye
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21761 and previous config saved to /var/cache/conftool/dbconfig/20220303-070910-ladsgroup.json
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21760 and previous config saved to /var/cache/conftool/dbconfig/20220303-065405-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21759 and previous config saved to /var/cache/conftool/dbconfig/20220303-064945-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21758 and previous config saved to /var/cache/conftool/dbconfig/20220303-064937-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2106.codfw.wmnet with OS bullseye
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T302950)', diff saved to https://phabricator.wikimedia.org/P21757 and previous config saved to /var/cache/conftool/dbconfig/20220303-063514-ladsgroup.json
  • 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21756 and previous config saved to /var/cache/conftool/dbconfig/20220303-063433-ladsgroup.json
  • 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T302950)', diff saved to https://phabricator.wikimedia.org/P21755 and previous config saved to /var/cache/conftool/dbconfig/20220303-063350-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2119.codfw.wmnet with OS bullseye
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21754 and previous config saved to /var/cache/conftool/dbconfig/20220303-061928-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
  • 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21753 and previous config saved to /var/cache/conftool/dbconfig/20220303-060423-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21752 and previous config saved to /var/cache/conftool/dbconfig/20220303-060006-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21751 and previous config saved to /var/cache/conftool/dbconfig/20220303-055959-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2119.codfw.wmnet with OS bullseye
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T302950)', diff saved to https://phabricator.wikimedia.org/P21750 and previous config saved to /var/cache/conftool/dbconfig/20220303-054657-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21749 and previous config saved to /var/cache/conftool/dbconfig/20220303-054454-ladsgroup.json
  • 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T302950)', diff saved to https://phabricator.wikimedia.org/P21748 and previous config saved to /var/cache/conftool/dbconfig/20220303-053324-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21747 and previous config saved to /var/cache/conftool/dbconfig/20220303-052949-ladsgroup.json
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2136.codfw.wmnet with OS bullseye
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21746 and previous config saved to /var/cache/conftool/dbconfig/20220303-051444-ladsgroup.json
  • 04:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21745 and previous config saved to /var/cache/conftool/dbconfig/20220303-044933-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21744 and previous config saved to /var/cache/conftool/dbconfig/20220303-044926-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2136.codfw.wmnet with OS bullseye
  • 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T302950)', diff saved to https://phabricator.wikimedia.org/P21743 and previous config saved to /var/cache/conftool/dbconfig/20220303-043942-ladsgroup.json
  • 04:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 04:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T302950)', diff saved to https://phabricator.wikimedia.org/P21742 and previous config saved to /var/cache/conftool/dbconfig/20220303-043759-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21741 and previous config saved to /var/cache/conftool/dbconfig/20220303-043421-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2140.codfw.wmnet with OS bullseye
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21740 and previous config saved to /var/cache/conftool/dbconfig/20220303-041916-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 04:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21739 and previous config saved to /var/cache/conftool/dbconfig/20220303-040412-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21738 and previous config saved to /var/cache/conftool/dbconfig/20220303-035954-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2140.codfw.wmnet with OS bullseye
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T302950)', diff saved to https://phabricator.wikimedia.org/P21737 and previous config saved to /var/cache/conftool/dbconfig/20220303-035328-ladsgroup.json
  • 03:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 03:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21736 and previous config saved to /var/cache/conftool/dbconfig/20220303-035134-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2147.codfw.wmnet with OS bullseye
  • 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21735 and previous config saved to /var/cache/conftool/dbconfig/20220303-033628-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21734 and previous config saved to /var/cache/conftool/dbconfig/20220303-032123-ladsgroup.json
  • 03:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2147.codfw.wmnet with OS bullseye
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21733 and previous config saved to /var/cache/conftool/dbconfig/20220303-030618-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21732 and previous config saved to /var/cache/conftool/dbconfig/20220303-030518-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21731 and previous config saved to /var/cache/conftool/dbconfig/20220303-025500-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 02:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 01:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
  • 01:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
  • 00:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dumpsdata1007.eqiad.wmnet
  • 00:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:25 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 00:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts dumpsdata1007.eqiad.wmnet

2022-03-02

  • 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 23:25 ryankemper: T276198 Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from T276198"'`
  • 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:21 ryankemper: T276198 https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
  • 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 22:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 22:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 22:55 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:54 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:54 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 22:52 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 22:52 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:51 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:51 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 22:50 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 22:50 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 22:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 22:48 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 22:48 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 22:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 22:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 22:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 22:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 22:45 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 22:45 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 22:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 22:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 22:41 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 22:21 ryankemper: T276198 Downtimed `elastic1052` for 2 hours while troubleshooting
  • 22:16 ryankemper: T276198 Testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on `elastic1052`; elasticsearch service fails to start. It's expecting to find `/etc/tmpfiles.d/elasticsearch-production-search-psi-eqiad.conf` but the actual filename is `elasticsearch-production-search-psi-eqiad-conf.conf`. Not sure why that trailing `-conf` is there in the filename. It doesn't look like something `systemd::tmpfile` is doing.
  • 22:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:59 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Linter/includes/Hooks.php: Backport: Hooks.php: Check for non-array $tags (T302918) (duration: 00m 50s)
  • 21:53 ryankemper: T276198 Disabled puppet across all of elastic*, cloudelastic*, and relforge* to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on a single elastic host
  • 21:44 mutante: rolling out scap 4.4.2 on 'all' T302919
  • 21:36 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:19 dancy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wmf-config: Undeploy the fawiki test survey from production (T300291) (duration: 00m 50s)
  • 21:13 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing scap 4.4.2
  • 21:05 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:00 mutante: deploy1002 - upgraded scap to 4.4.2-1 T302919
  • 20:48 mutante: running test-deploy to devcluster (restbase) to test new scap version, succesful and then rolled back, as the docs say T302919
  • 20:48 dzahn@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 41s)
  • 20:47 dzahn@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 20:44 mutante: testec 'scap pull' still worked on mwdebug1001; rolling out scap 4.4.2 to A:restbase-canary (T302919)
  • 20:38 mutante: rolling out scap 4.4.2 to A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary (T302919)
  • 20:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:57 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 19:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 19:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:46 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/ApiFeatureUsage: Backport: Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica (T302907) (duration: 00m 50s)
  • 19:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:36 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:33 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:30 mutante: stopped icinga-wm
  • 19:14 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs T300200 (duration: 00m 50s)
  • 19:13 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs T300200
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21729 and previous config saved to /var/cache/conftool/dbconfig/20220302-191323-ladsgroup.json
  • 19:10 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group1
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21728 and previous config saved to /var/cache/conftool/dbconfig/20220302-185819-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21727 and previous config saved to /var/cache/conftool/dbconfig/20220302-184314-ladsgroup.json
  • 18:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21726 and previous config saved to /var/cache/conftool/dbconfig/20220302-182809-ladsgroup.json
  • 18:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21725 and previous config saved to /var/cache/conftool/dbconfig/20220302-182153-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21724 and previous config saved to /var/cache/conftool/dbconfig/20220302-182145-ladsgroup.json
  • 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 18:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21723 and previous config saved to /var/cache/conftool/dbconfig/20220302-180640-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21722 and previous config saved to /var/cache/conftool/dbconfig/20220302-175136-ladsgroup.json
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21721 and previous config saved to /var/cache/conftool/dbconfig/20220302-173631-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21720 and previous config saved to /var/cache/conftool/dbconfig/20220302-173112-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21719 and previous config saved to /var/cache/conftool/dbconfig/20220302-173104-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21718 and previous config saved to /var/cache/conftool/dbconfig/20220302-171559-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21717 and previous config saved to /var/cache/conftool/dbconfig/20220302-170055-ladsgroup.json
  • 16:51 vgutierrez: pool cp3061 running HAProxy as TLS termination layer - T290005 T271421
  • 16:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS buster
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21716 and previous config saved to /var/cache/conftool/dbconfig/20220302-164550-ladsgroup.json
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21715 and previous config saved to /var/cache/conftool/dbconfig/20220302-163329-ladsgroup.json
  • 16:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21714 and previous config saved to /var/cache/conftool/dbconfig/20220302-163322-ladsgroup.json
  • 16:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 16:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21713 and previous config saved to /var/cache/conftool/dbconfig/20220302-161817-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21711 and previous config saved to /var/cache/conftool/dbconfig/20220302-160312-ladsgroup.json
  • 15:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS buster
  • 15:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS buster
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21710 and previous config saved to /var/cache/conftool/dbconfig/20220302-154807-ladsgroup.json
  • 15:47 vgutierrez: pool cp5014 running HAProxy as TLS termination layer - T290005 T271421
  • 15:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 15:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21709 and previous config saved to /var/cache/conftool/dbconfig/20220302-154039-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21708 and previous config saved to /var/cache/conftool/dbconfig/20220302-154026-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21707 and previous config saved to /var/cache/conftool/dbconfig/20220302-152519-ladsgroup.json
  • 15:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21706 and previous config saved to /var/cache/conftool/dbconfig/20220302-151015-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21705 and previous config saved to /var/cache/conftool/dbconfig/20220302-145510-ladsgroup.json
  • 14:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS buster
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21704 and previous config saved to /var/cache/conftool/dbconfig/20220302-145054-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21703 and previous config saved to /var/cache/conftool/dbconfig/20220302-145046-ladsgroup.json
  • 14:41 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:38 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21702 and previous config saved to /var/cache/conftool/dbconfig/20220302-143541-ladsgroup.json
  • 14:34 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4034.ulsfo.wmnet with OS buster
  • 14:27 moritzm: rebalance VMs in Ganeti row A after adding new servers (and decomissioning old ones)
  • 14:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/FlaggedRevs/modules/ext.flaggedRevs.review/review.js: Backport: ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop (duration: 00m 52s)
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21701 and previous config saved to /var/cache/conftool/dbconfig/20220302-142037-ladsgroup.json
  • 14:13 mmandere: pool cp6013
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21700 and previous config saved to /var/cache/conftool/dbconfig/20220302-140532-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21699 and previous config saved to /var/cache/conftool/dbconfig/20220302-140112-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21698 and previous config saved to /var/cache/conftool/dbconfig/20220302-140105-ladsgroup.json
  • 13:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21697 and previous config saved to /var/cache/conftool/dbconfig/20220302-134600-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21696 and previous config saved to /var/cache/conftool/dbconfig/20220302-133055-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21695 and previous config saved to /var/cache/conftool/dbconfig/20220302-131550-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21694 and previous config saved to /var/cache/conftool/dbconfig/20220302-131032-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21693 and previous config saved to /var/cache/conftool/dbconfig/20220302-131024-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21692 and previous config saved to /var/cache/conftool/dbconfig/20220302-125519-ladsgroup.json
  • 12:47 reedy@deploy1002: Finished scap: Fix MassMessage translations T302840 (duration: 01m 50s)
  • 12:45 reedy@deploy1002: Started scap: Fix MassMessage translations T302840
  • 12:43 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21690 and previous config saved to /var/cache/conftool/dbconfig/20220302-124014-ladsgroup.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21689 and previous config saved to /var/cache/conftool/dbconfig/20220302-122510-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21688 and previous config saved to /var/cache/conftool/dbconfig/20220302-122049-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21687 and previous config saved to /var/cache/conftool/dbconfig/20220302-121754-ladsgroup.json
  • 12:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21686 and previous config saved to /var/cache/conftool/dbconfig/20220302-120250-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21685 and previous config saved to /var/cache/conftool/dbconfig/20220302-114745-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21684 and previous config saved to /var/cache/conftool/dbconfig/20220302-113240-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21683 and previous config saved to /var/cache/conftool/dbconfig/20220302-112824-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21682 and previous config saved to /var/cache/conftool/dbconfig/20220302-112347-ladsgroup.json
  • 11:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" (duration: 01m 29s)
  • 11:22 mbsantos: rollback maps eqiad to a previous working state to mitigate geoshape errors
  • 11:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e"
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21681 and previous config saved to /var/cache/conftool/dbconfig/20220302-110842-ladsgroup.json
  • 11:05 moritzm: installing expat security updates
  • 10:56 moritzm: restarting apache2 and mailman3-web on lists.wikimedia.org for expat security update
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21680 and previous config saved to /var/cache/conftool/dbconfig/20220302-105336-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21678 and previous config saved to /var/cache/conftool/dbconfig/20220302-103832-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21677 and previous config saved to /var/cache/conftool/dbconfig/20220302-103407-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 10:15 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 10:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging" (duration: 01m 45s)
  • 10:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 10:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging"
  • 10:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging" (duration: 01m 36s)
  • 10:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging"
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21676 and previous config saved to /var/cache/conftool/dbconfig/20220302-100903-ladsgroup.json
  • 10:04 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2002.codfw.wmnet
  • 09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:55 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21675 and previous config saved to /var/cache/conftool/dbconfig/20220302-095358-ladsgroup.json
  • 09:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging (duration: 04m 26s)
  • 09:49 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 09:49 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2002.codfw.wmnet
  • 09:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2001.codfw.wmnet
  • 09:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging
  • 09:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging (duration: 02m 13s)
  • 09:44 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging
  • 09:39 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21674 and previous config saved to /var/cache/conftool/dbconfig/20220302-093853-ladsgroup.json
  • 09:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 09:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2001.codfw.wmnet
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21672 and previous config saved to /var/cache/conftool/dbconfig/20220302-092348-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21671 and previous config saved to /var/cache/conftool/dbconfig/20220302-092128-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21670 and previous config saved to /var/cache/conftool/dbconfig/20220302-092120-ladsgroup.json
  • 09:16 mmandere: rolling restart of varnishkafka-* on cp6*
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21669 and previous config saved to /var/cache/conftool/dbconfig/20220302-091523-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21668 and previous config saved to /var/cache/conftool/dbconfig/20220302-090615-ladsgroup.json
  • 09:05 XioNoX: push Capirca managed labs-in firewall filter to eqiad routers
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21667 and previous config saved to /var/cache/conftool/dbconfig/20220302-090018-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21666 and previous config saved to /var/cache/conftool/dbconfig/20220302-085111-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json
  • 08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1167.eqiad.wmnet with OS bullseye
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21664 and previous config saved to /var/cache/conftool/dbconfig/20220302-083606-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21663 and previous config saved to /var/cache/conftool/dbconfig/20220302-083345-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21662 and previous config saved to /var/cache/conftool/dbconfig/20220302-083338-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21661 and previous config saved to /var/cache/conftool/dbconfig/20220302-081832-ladsgroup.json
  • 08:09 godog: test thanos 0.24.0 on thanos-fe2001 to check if https://github.com/thanos-io/thanos/issues/4531 is fixed
  • 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1167.eqiad.wmnet with OS bullseye
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21660 and previous config saved to /var/cache/conftool/dbconfig/20220302-080327-ladsgroup.json
  • 08:02 Amir1: killing all entity dumpers of wikidata in snapshot1008 (T300255)
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21659 and previous config saved to /var/cache/conftool/dbconfig/20220302-074822-ladsgroup.json
  • 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21658 and previous config saved to /var/cache/conftool/dbconfig/20220302-074602-ladsgroup.json
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json
  • 07:35 _joe_: filling request patterns in etcd
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21655 and previous config saved to /var/cache/conftool/dbconfig/20220302-072105-ladsgroup.json
  • 07:09 _joe_: installing scap 4.4.1 everywhere T302464
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21654 and previous config saved to /var/cache/conftool/dbconfig/20220302-070601-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21651 and previous config saved to /var/cache/conftool/dbconfig/20220302-062428-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21650 and previous config saved to /var/cache/conftool/dbconfig/20220302-060924-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json
  • 05:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1101.eqiad.wmnet with OS bullseye
  • 05:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1101.eqiad.wmnet with OS bullseye
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21647 and previous config saved to /var/cache/conftool/dbconfig/20220302-051947-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21644 and previous config saved to /var/cache/conftool/dbconfig/20220302-050442-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21643 and previous config saved to /var/cache/conftool/dbconfig/20220302-045021-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21642 and previous config saved to /var/cache/conftool/dbconfig/20220302-044938-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21641 and previous config saved to /var/cache/conftool/dbconfig/20220302-043516-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21640 and previous config saved to /var/cache/conftool/dbconfig/20220302-043433-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21639 and previous config saved to /var/cache/conftool/dbconfig/20220302-043313-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21638 and previous config saved to /var/cache/conftool/dbconfig/20220302-043229-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21636 and previous config saved to /var/cache/conftool/dbconfig/20220302-041725-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS bullseye
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21635 and previous config saved to /var/cache/conftool/dbconfig/20220302-040220-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
  • 03:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS bullseye
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21634 and previous config saved to /var/cache/conftool/dbconfig/20220302-034715-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21632 and previous config saved to /var/cache/conftool/dbconfig/20220302-034454-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 03:43 ejegg: updated CiviCRM from e9f0eff5 to cb0605ed
  • 02:13 ejegg: Fundraising CiviCRM updated from 2874d623 to e9f0eff5
  • 00:15 topranks: Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.
  • 00:07 topranks: disabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.

2022-03-01

  • 22:51 inflatador: T276198 reenabled puppet on elastic1052.eqiad.wmnet
  • 22:37 inflatador: T276198 rebooting elastic1052.eqiad.wmnet to test failure condition
  • 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
  • 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
  • 22:32 inflatador: T276198 disabling puppet on elastic1052.eqiad.wmnet to test failure condition (rebooting shortly)
  • 21:53 dancy@deploy1002: Finished scap: Resync to try to clear alerts (duration: 12m 08s)
  • 21:41 dancy@deploy1002: Started scap: Resync to try to clear alerts
  • 21:36 dancy@deploy1002: Started scap: Resync to try to clear alerts
  • 20:36 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.24 refs T300200
  • 20:33 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group0; note this may briefly trigger some version alerts
  • 20:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/includes: Backport: Revert "preferences: Use a faster and simpler form descriptor when validating" (T302643) (duration: 00m 55s)
  • 20:05 mutante: alert1001 - re-enabled puppet
  • 20:05 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.24 refs T300200 (duration: 53m 17s)
  • 19:45 mutante: alert1001 - disable puppet, systemctl stop ircecho - to stop bot spam, caused somehow by new scap version breaking "mw versions mismwatch" alerting - affects labtestwiki,testwiki,testwikidatawiki
  • 19:38 mutante: mw1449 - scap pull
  • 19:36 mutante: mw1414 - scap pull
  • 19:11 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.24 refs T300200
  • 19:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2008.codfw.wmnet
  • 19:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 18:57 brennen: 1.38.0-wmf.24 train (T300200): there's currently a single blocker at T302643; staging to testwikis and holding there until backport's available
  • 18:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2008.codfw.wmnet
  • 18:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21626 and previous config saved to /var/cache/conftool/dbconfig/20220301-180216-ladsgroup.json
  • 17:52 cwhite: completed grafana upgrade in eqiad T282863
  • 17:50 herron: re-enabling puppet and ircecho on alert1001
  • 17:47 cwhite: upgrade grafana in eqiad T282863
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21625 and previous config saved to /var/cache/conftool/dbconfig/20220301-174711-ladsgroup.json
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21624 and previous config saved to /var/cache/conftool/dbconfig/20220301-173206-ladsgroup.json
  • 17:24 dancy@deploy1002: Finished scap: testing container image build (duration: 28m 39s)
  • 17:17 herron: stopped ircecho on alert1001 due to systemd unit alert shower
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21622 and previous config saved to /var/cache/conftool/dbconfig/20220301-171701-ladsgroup.json
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21621 and previous config saved to /var/cache/conftool/dbconfig/20220301-171441-ladsgroup.json
  • 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 17:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:55 dancy@deploy1002: Started scap: testing container image build
  • 16:24 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 03s)
  • 16:23 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
  • 16:12 moritzm: restarting apache on logstash nodes to pick up expat update
  • 16:11 elukey@deploy1002: Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195 (duration: 36m 13s)
  • 16:05 moritzm: restarting nginx on wcqs* nodes to pick up expat update
  • 15:35 elukey@deploy1002: Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195
  • 15:21 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 07s)
  • 15:21 ntsako@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
  • 15:06 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2003.codfw.wmnet
  • 14:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 elukey: elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node)
  • 14:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:51 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 14:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2002.codfw.wmnet
  • 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:38 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:36 vgutierrez: pool cp1087 running HAProxy as TLS termination layer - T290005 T271421
  • 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
  • 14:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 14:32 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
  • 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2001.codfw.wmnet
  • 14:19 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:19 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:14 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 14:09 moritzm: restarting nginx on wdqs* nodes to pick up expat update
  • 14:03 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
  • 14:03 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:53 mmandere: restart purged on cp60[15-16]
  • 13:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:48 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:48 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 13:48 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
  • 13:48 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:44 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
  • 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:43 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:40 kormat: Deploying wmfmariadbpy 0.9 T302796
  • 13:40 kormat: uploaded wmfmariadbpy 0.9 to apt.wm.o T302796
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 13:32 moritzm: restarting nginx on registry* nodes to pick up expat update
  • 13:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
  • 13:15 XioNoX: restart cr1-drmrs for software upgrade
  • 13:03 moritzm: restarting FPM/Apache on parsoid hosts to pick up expat update
  • 12:50 vgutierrez: pool cp3062 running HAProxy as TLS termination layer - T290005 T271421
  • 12:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
  • 12:39 moritzm: installing expat security updates
  • 12:34 mmandere: restart purged on cp60[12-14]
  • 12:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)
  • 12:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker
  • 12:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)
  • 12:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker
  • 12:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)
  • 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration
  • 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)
  • 12:09 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration
  • 11:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:36 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 11:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 11:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
  • 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 11:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 11:27 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 11:27 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:21 _joe_: restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843
  • 11:18 _joe_: also removed the ipvsadm entry for apaches:80 T244843
  • 11:17 jayme: rolled back linkrecommendation staging helm release to revision 12 - T302744
  • 11:17 _joe_: restarting pybal on lvs1020 T244843
  • 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 11:11 _joe_: restarted pybal on lvs2009, T244843
  • 11:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 11:07 _joe_: restarted pybal on lvs2010, T244843
  • 11:02 mmandere: restart purged on cp60[09,10,11]
  • 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:47 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts
  • 10:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts
  • 10:39 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts
  • 10:31 mmandere: restart purged on cp600[6-8]
  • 10:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:05 vgutierrez: pool cp2039 running HAProxy as TLS termination layer - T290005 T271421
  • 09:48 elukey: elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)
  • 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
  • 09:33 _joe_: restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted
  • 09:31 _joe_: restart pybal on lvs1020
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts
  • 09:25 elukey: restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)
  • 09:25 _joe_: manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore
  • 09:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts
  • 09:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts
  • 09:22 _joe_: restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820
  • 09:20 _joe_: restarted pybal on lvs2010
  • 09:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 09:06 elukey: restart purged on cp6005
  • 08:57 elukey: restart purged on cp6004
  • 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
  • 08:27 urbanecm: UTC morning B&C window done
  • 08:25 elukey: restart purged on cp6003
  • 08:16 moritzm: drain instances off ganeti2008 for eventual decom
  • 08:08 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: d149208: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s)
  • 07:59 elukey: restart purged on cp6002
  • 06:58 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s)
  • 06:57 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test
  • 06:56 elukey: restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)
  • 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye} T302464
  • 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye}
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
  • 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json
  • 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' | grep grandpa => "“Wikipedia is like an all-knowing grandpa.”" | T300171

Archives

See Server Admin Log/Archives.