You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance)
imported>Stashbot
(ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance)
 
(115 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 2022-06-03 ==
== 2022-09-30 ==
* 22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00
* 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
* 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
* 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
* 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 13:51 moritzm: installing puppetdb-test2001 [[phab:T318931|T318931]]
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and


== 2022-06-02 ==
== 2022-09-22 ==
* 23:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:56 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host clouddumps1001.wikimedia.org with OS bullseye
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 23:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
* 23:42 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:27 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: Add db-mainstash g 752807 (duration: 03m 24s)
* 23:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:22 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
* 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 22:50 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddumps1001.wikimedia.org with OS bullseye
* 22:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 22:33 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddumps1001.wikimedia.org with OS bullseye
* 22:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 22:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
* 22:27 ejegg: updated payments-wiki from {{Gerrit|4e9470de}} to {{Gerrit|8c6208c2}}
* 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29363 and previous config saved to /var/cache/conftool/dbconfig/20220602-222306-ladsgroup.json
* 22:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 22:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 22:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 22:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
* 22:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 22:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host clouddumps1001.wikimedia.org with OS bullseye
* 21:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:25 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Emergency deploy: [[gerrit:802637{{!}}Stop writing to cuc_actor on all wikis (T233004 T309737)]] (duration: 03m 15s)
* 21:23 dancy@deploy1002: backport aborted:  (duration: 00m 05s)
* 21:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 andrew@cumin1001: END (PASS) - Cookbook sre
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:55 brennen: end of utc late backport & config window
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 18:38 dancy@deploy1002: Started scap: testing
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:39 dancy@deploy1002: Sync cancelled.
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
== 2022-09-21 ==
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 tgr_: UTC late deploys done
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:04 samtar@deploy1002
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:07 moritzm: disabling puppet in codfw and the edges temporarily
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:15 joal@deploy1002: Finished deploy [airflow-dags/analytics@19b943d]: (no justification provided) (duration: 00m 09s)
* 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
* 12:15 joal@deploy1002: Started deploy [airflow-dags/analytics@19b943d]: (no justification provided)
* 13:01 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29343 and previous config saved to /var/cache/conftool/dbconfig/20220602-120320-marostegui.json
* 12:48 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
* 11:44 moritzm: installing python-pip bugfix updates from bullseye point release
* 12:47 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1007.eqiad.wmnet with OS bullseye
* 11:40 moritzm: installing tasksel updates from bullseye point release
* 12:33 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host datahubsearch1003.eqiad.wmnet
* 11:31 hashar: Restarted Gerrit on replica gerrit2001
* 12:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1005,parse1005.mgmt
* 11:23 moritzm: installing sysvinit-utils bugfix updates from last bullseye point release
* 12:31 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1005,parse1005.mgmt
* 11:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
* 11:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
* 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:20 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
* 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:20 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
* 10:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:16 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1005.eqiad.wmnet
* 10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:14 claime: depooled wtp1037.eqiad.wmnet from parsoid cluster [[phab:T312638|T312638]]
* 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
* 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:10 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[2142-2144].codfw.wmnet
* 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:10 tstarling@cumin1001: START - Cookbook sre.hosts.remove-downtime for db[2142-2144].codfw.wmnet
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.mgmt
* 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:10 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.mgmt
* 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:10 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
* 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
* 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[1001-1004].eqiad.wmnet
* 08:54 joal@deploy1002: Finished deploy [airflow-dags/analytics@19cd054]: (no justification provided) (duration: 00m 09s)
* 11:56 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[1001-1004].eqiad.wmnet
* 08:54 joal@deploy1002: Started deploy [airflow-dags/analytics@19cd054]: (no justification provided)
* 11:55 TimStarling: on db2142: rejecting inbound mysql traffic [[phab:T316847|T316847]]
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29340 and previous config saved to /var/cache/conftool/dbconfig/20220602-085357-marostegui.json
* 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
* 08:32 jayme: imported scap 4.8.1 to stretch-/buster-/bullseye-wikimedia - [[phab:T309116|T309116]]
* 11:53 claime: pooled parse1004.eqiad.wmnet (php 7.4 only) in parsoid cluster [[phab:T312638|T312638]]
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29339 and previous config saved to /var/cache/conftool/dbconfig/20220602-082700-marostegui.json
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.eqiad.wmnet
* 08:03 joal@deploy1002: Finished deploy [analytics/refinery@ef68481] (hadoop-test): Additional analytics weekly train TEST [analytics/refinery@ef68481] (duration: 07m 33s)
* 11:52 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.eqiad.wmnet
* 07:55 joal@deploy1002: Started deploy [analytics/refinery@ef68481] (hadoop-test): Additional analytics weekly train TEST [analytics/refinery@ef68481]
* 11:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
* 07:54 joal@deploy1002: Finished deploy [analytics/refinery@ef68481] (thin): Additional analytics weekly train THIN [analytics/refinery@ef68481] (duration: 00m 08s)
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33784 and previous config saved to /var/cache/conftool/dbconfig/20220905-114352-ladsgroup.json
* 07:54 joal@deploy1002: Started deploy [analytics/refinery@ef68481] (thin): Additional analytics weekly train THIN [analytics/refinery@ef68481]
* 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 07:51 joal@deploy1002: Finished deploy [analytics/refinery@ef68481]: Additional analytics weekly train [analytics/refinery@ef68481] (duration: 24m 33s)
* 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 07:26 joal@deploy1002: Started deploy [analytics/refinery@ef68481]: Additional analytics weekly train [analytics/refinery@ef68481]
* 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29338 and previous config saved to /var/cache/conftool/dbconfig/20220602-071547-marostegui.json
* 11:41 jnuche@deploy1002: Installation of scap version "4.16.0" completed for 584 hosts
* 07:05 moritzm: installing systemd bugfix updates from last bullseye point release, also includes a minor security fix in systemd-tmpfiles
* 11:41 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29337 and previous config saved to /var/cache/conftool/dbconfig/20220602-065203-marostegui.json
* 11:40 jnuche@deploy1002: Installing scap version "4.16.0" for 584 hosts
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29336 and previous config saved to /var/cache/conftool/dbconfig/20220602-063710-marostegui.json
* 11:37 TimStarling: on db2142: dropping inbound mysql traffic [[phab:T316847|T316847]]
* 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1181 [[phab:T309617|T309617]]', diff saved to https://phabricator.wikimedia.org/P29335 and previous config saved to /var/cache/conftool/dbconfig/20220602-061039-ladsgroup.json
* 11:36 claime: Set wtp103[4-5].eqiad.wmnet inactive pending decommission https://phabricator.wikimedia.org/T317025
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 primary and set section read-write [[phab:T309617|T309617]]', diff saved to https://phabricator.wikimedia.org/P29334 and previous config saved to /var/cache/conftool/dbconfig/20220602-060053-ladsgroup.json
* 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1035.eqiad.wmnet
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - [[phab:T309617|T309617]]', diff saved to https://phabricator.wikimedia.org/P29333 and previous config saved to /var/cache/conftool/dbconfig/20220602-060016-ladsgroup.json
* 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1034.eqiad.wmnet
* 06:00 Amir1: Starting s7 eqiad failover from db1181 to db1136 - [[phab:T309617|T309617]]
* 11:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
* 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29332 and previous config saved to /var/cache/conftool/dbconfig/20220602-055500-ladsgroup.json
* 11:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
* 05:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 11:30 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1004.eqiad.wmnet
* 05:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 11:29 TimStarling: on db2142: set master_delay=30 and restarted replication [[phab:T316847|T316847]]
* 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29331 and previous config saved to /var/cache/conftool/dbconfig/20220602-055452-ladsgroup.json
* 11:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1003.eqiad.wmnet
* 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29330 and previous config saved to /var/cache/conftool/dbconfig/20220602-053947-ladsgroup.json
* 11:27 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1003.eqiad.wmnet
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1137 in x1 with minimal weight to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29329 and previous config saved to /var/cache/conftool/dbconfig/20220602-053340-marostegui.json
* 11:24 claime: depooled wtp1036.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
* 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29328 and previous config saved to /var/cache/conftool/dbconfig/20220602-052442-ladsgroup.json
* 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 05:15 ryankemper: [[phab:T309720|T309720]] Finished manual rolling restart of `cloudelastic` cluster to get new S3 plugin operational
* 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088 (s1 and s2) [[phab:T309485|T309485]]', diff saved to https://phabricator.wikimedia.org/P29327 and previous config saved to /var/cache/conftool/dbconfig/20220602-051451-marostegui.json
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33783 and previous config saved to /var/cache/conftool/dbconfig/20220905-112308-ladsgroup.json
* 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29326 and previous config saved to /var/cache/conftool/dbconfig/20220602-050937-ladsgroup.json
* 11:18 TimStarling: on db2142: stopped mariadb replication
* 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1136 with weight 0 [[phab:T309617|T309617]]', diff saved to https://phabricator.wikimedia.org/P29325 and previous config saved to /var/cache/conftool/dbconfig/20220602-050559-ladsgroup.json
* 11:16 claime: pooled parse1003.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
* 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s7 [[phab:T309617|T309617]]
* 11:16 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2142-2144].codfw.wmnet with reason: [[phab:T316847|T316847]] x2 failure test
* 05:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s7 [[phab:T309617|T309617]]
* 11:15 tstarling@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2142-2144].codfw.wmnet with reason: [[phab:T316847|T316847]] x2 failure test
* 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 11:15 cgoubert@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33782 and previous config saved to /var/cache/conftool/dbconfig/20220905-110801-ladsgroup.json
* 02:10 krinkle@deploy1002: Synchronized docroot/noc/: {{Gerrit|Ic0e134c61d6}} (duration: 03m 20s)
* 11:04 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
* 02:04 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ic0e134c61d6}} (duration: 03m 02s)
* 10:55 Emperor: set thanos ring replicas to 3.90 [[phab:T311690|T311690]]
* 01:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33781 and previous config saved to /var/cache/conftool/dbconfig/20220905-105255-ladsgroup.json
* 01:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33780 and previous config saved to /var/cache/conftool/dbconfig/20220905-103749-ladsgroup.json
* 01:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1015.eqiad.wmnet
* 01:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 01:38 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Id9b34b755230}} no-op (duration: 03m 12s)
* 10:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1015.eqiad.wmnet
* 01:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1014.eqiad.wmnet
* 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1014.eqiad.wmnet
* 01:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1013.eqiad.wmnet
* 01:15 krinkle@deploy1002: Synchronized src/Profiler.php: {{Gerrit|I257b41a45}} (duration: 03m 15s)
* 10:13 XioNoX: upgrade python-pynetbox to 6.6 on netbox frontends - [[phab:T310745|T310745]]
* 01:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:11 hnowlan@deploy1002: Finished deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary [[phab:T309058|T309058]] [[phab:T312216|T312216]] (duration: 15m 05s)
* 01:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1013.eqiad.wmnet
* 01:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:56 hnowlan@deploy1002: Started deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary [[phab:T309058|T309058]] [[phab:T312216|T312216]]
* 01:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:47 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 01:09 krinkle@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: {{Gerrit|Iebd29aaa}} (duration: 02m 57s)
* 09:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
* 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1012.eqiad.wmnet
* 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:37 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
* 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 01:05 krinkle@deploy1002: Synchronized src/Profiler.php: {{Gerrit|I93b3e43d32}} (duration: 03m 16s)
* 09:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1012.eqiad.wmnet
* 00:50 krinkle@deploy1002: Synchronized wmf-config/MetaContactPages.php: {{Gerrit|Ief1368fd959f428}} (duration: 02m 56s)
* 09:25 btullis: deployed calico to dse-k8s cluster [[phab:T310174|T310174]]
* 00:46 krinkle@deploy1002: Synchronized php-1.39.0-wmf.14/extensions/WikimediaMessages/: {{Gerrit|I5a700cd3648}} (duration: 03m 01s)
* 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 00:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33779 and previous config saved to /var/cache/conftool/dbconfig/20220905-092338-ladsgroup.json
* 00:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:23 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 00:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1010.eqiad.wmnet
* 00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:17 XioNoX: Squid: permit production networks instead of aggregate_networks - [[phab:T265864|T265864]]
* 09:17 moritzm: installing flac security updates
* 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1010.eqiad.wmnet
* 09:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1008.eqiad.wmnet
* 09:05 hnowlan@deploy1002: Finished deploy [restbase/deploy@a571f9a]: Add pcmwiki [[phab:T310880|T310880]] (duration: 01m 06s)
* 09:04 hnowlan@deploy1002: Started deploy [restbase/deploy@a571f9a]: Add pcmwiki [[phab:T310880|T310880]]
* 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1008.eqiad.wmnet
* 09:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1006.eqiad.wmnet
* 08:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1006.eqiad.wmnet
* 08:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
* 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 08:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:829562{{!}}Stop writing to old templatelinks fields in s7 (T312865)]] (duration: 03m 51s)
* 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
* 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:01 XioNoX: rename Telia to Arelion in Netbox
* 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:32 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:829556{{!}}Make English Wikipedia read new on templatelinks migration (T306673)]] (duration: 03m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|739920ceb09358a2ea89d82494522876fffd2621}}: Fix missing logo for mniwiktionary and frwikiquote ([[phab:T317004|T317004]]) (duration: 03m 36s)
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|ff2e1082d8b3fe0ba93cd37a1b516dece84a834b}}: Upload missing logo for mniwiktionary and frwikiquote ([[phab:T317004|T317004]]) (duration: 03m 50s)
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:19 moritzm: installing ghostscript security updates
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:823678{{!}}Move 10% of traffic to php 7.4 (T271736)]] (duration: 03m 50s)
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
* 06:28 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
* 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 06:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 02:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
* 02:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
* 02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33778 and previous config saved to /var/cache/conftool/dbconfig/20220905-024602-ladsgroup.json
* 00:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33777 and previous config saved to /var/cache/conftool/dbconfig/20220905-003619-ladsgroup.json
* 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33776 and previous config saved to /var/cache/conftool/dbconfig/20220905-002112-ladsgroup.json
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33775 and previous config saved to /var/cache/conftool/dbconfig/20220905-000606-ladsgroup.json
 
== 2022-09-04 ==
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33774 and previous config saved to /var/cache/conftool/dbconfig/20220904-235100-ladsgroup.json
* 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33773 and previous config saved to /var/cache/conftool/dbconfig/20220904-225044-ladsgroup.json
* 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33772 and previous config saved to /var/cache/conftool/dbconfig/20220904-225016-ladsgroup.json
* 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33771 and previous config saved to /var/cache/conftool/dbconfig/20220904-223510-ladsgroup.json
* 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33770 and previous config saved to /var/cache/conftool/dbconfig/20220904-222004-ladsgroup.json
* 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33769 and previous config saved to /var/cache/conftool/dbconfig/20220904-220457-ladsgroup.json
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33767 and previous config saved to /var/cache/conftool/dbconfig/20220904-155059-ladsgroup.json
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33766 and previous config saved to /var/cache/conftool/dbconfig/20220904-155027-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33765 and previous config saved to /var/cache/conftool/dbconfig/20220904-153521-ladsgroup.json
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33764 and previous config saved to /var/cache/conftool/dbconfig/20220904-152015-ladsgroup.json
* 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33763 and previous config saved to /var/cache/conftool/dbconfig/20220904-150508-ladsgroup.json
* 12:51 elukey: reset-fail ifup@ens13.service on idp2002
* 12:50 elukey: reset-fail ifup@ens13.service on netflow4002
* 12:49 elukey: pkill remaining processes of user effeietsanders on stat1008 to unblock puppet - [[phab:T314846|T314846]]
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33762 and previous config saved to /var/cache/conftool/dbconfig/20220904-103427-ladsgroup.json
* 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33761 and previous config saved to /var/cache/conftool/dbconfig/20220904-103405-ladsgroup.json
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33760 and previous config saved to /var/cache/conftool/dbconfig/20220904-083341-ladsgroup.json
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
 
== 2022-09-03 ==
* 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33759 and previous config saved to /var/cache/conftool/dbconfig/20220903-235001-ladsgroup.json
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33758 and previous config saved to /var/cache/conftool/dbconfig/20220903-233455-ladsgroup.json
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33757 and previous config saved to /var/cache/conftool/dbconfig/20220903-231949-ladsgroup.json
* 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33756 and previous config saved to /var/cache/conftool/dbconfig/20220903-230443-ladsgroup.json
* 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33755 and previous config saved to /var/cache/conftool/dbconfig/20220903-220427-ladsgroup.json
* 22:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 22:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33754 and previous config saved to /var/cache/conftool/dbconfig/20220903-220326-ladsgroup.json
* 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P33753 and previous config saved to /var/cache/conftool/dbconfig/20220903-214820-ladsgroup.json
* 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P33752 and previous config saved to /var/cache/conftool/dbconfig/20220903-213314-ladsgroup.json
* 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33751 and previous config saved to /var/cache/conftool/dbconfig/20220903-211808-ladsgroup.json
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33750 and previous config saved to /var/cache/conftool/dbconfig/20220903-180104-ladsgroup.json
* 18:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
* 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
* 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33749 and previous config saved to /var/cache/conftool/dbconfig/20220903-180042-ladsgroup.json
* 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33748 and previous config saved to /var/cache/conftool/dbconfig/20220903-151224-ladsgroup.json
* 15:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 15:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33747 and previous config saved to /var/cache/conftool/dbconfig/20220903-015524-ladsgroup.json
* 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33746 and previous config saved to /var/cache/conftool/dbconfig/20220903-015502-ladsgroup.json


== 2022-06-01 ==
== 2022-09-02 ==
* 22:13 ryankemper: [[phab:T309720|T309720]] Downtimed cloudelastic until Monday while we perform maintenance across the next couple days (will manually lift downtime later)
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:33 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to enable S3 plugin - bking@cumin1001 - [[phab:T309720|T309720]]
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:33 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to enable S3 plugin - bking@cumin1001 - [[phab:T309720|T309720]]
* 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:10 ebernhardson: restart wdqs-blazegraph on wdqs1007 to resolve BlazegraphFreeAllocatorsDecreasingRapidly
* 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:58 dancy@deploy1002: Sync cancelled.
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:56 dancy@deploy1002: dancy: testing [[phab:T299648|T299648]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:51 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5a8e7586bcb0933c96e8294e389c31270edb134e}}: Revert "Start writing to cuc_actor everywhere except s4 and s8" ([[phab:T233004|T233004]]) (duration: 00m 32s)
* 18:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 andrew@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 18:47 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 andrew@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 18:40 dancy@deploy1002: Started scap: testing [[phab:T299648|T299648]]
* 20:36 andrew@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1203.eqiad.wmnet with OS bullseye
* 20:35 andrew@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 17:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 cjming: end of UTC late backport window
* 17:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 andrew@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 andrew@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
* 20:26 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800278{{!}}Start writing to cuc_actor everywhere except s4 and s8 (T233004)]] (duration: 03m 01s)
* 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1203.eqiad.wmnet with OS bullseye
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS bullseye
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1201.eqiad.wmnet with OS bullseye
* 16:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
* 16:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
* 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: host reimage
* 16:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1201.eqiad.wmnet with reason: host reimage
* 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS bullseye
* 16:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1200.eqiad.wmnet with OS bullseye
* 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1203']
* 16:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1201.eqiad.wmnet with OS bullseye
* 16:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1203']
* 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1199.eqiad.wmnet with OS bullseye
* 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1202']
* 16:23 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1200.eqiad.wmnet with reason: host reimage
* 16:19 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1200.eqiad.wmnet with reason: host reimage
* 16:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1203']
* 16:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1202']
* 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1203']
* 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1202']
* 16:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
* 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1200.eqiad.wmnet with OS bullseye
* 16:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1201']
* 16:03 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1198.eqiad.wmnet with OS bullseye
* 16:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1202']
* 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:57 jayme: repool kubemaster2002
* 15:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
* 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1199.eqiad.wmnet with OS bullseye
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1200']
* 15:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
* 15:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1197.eqiad.wmnet with OS bullseye
* 15:45 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
* 15:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1200']
* 15:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1200']
* 15:39 jayme: depool kubemaster2002
* 15:37 jayme: repooled kubemaster1001
* 15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1201']
* 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
* 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1197.eqiad.wmnet with reason: host reimage
* 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1198.eqiad.wmnet with OS bullseye
* 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1201']
* 15:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1200']
* 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1196.eqiad.wmnet with OS bullseye
* 15:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1197.eqiad.wmnet with reason: host reimage
* 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1198']
* 15:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
* 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1199']
* 15:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
* 15:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1197.eqiad.wmnet with OS bullseye
* 15:16 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage
* 15:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1199']
* 15:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1199']
* 15:13 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage
* 15:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1198']
* 15:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1199']
* 15:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
* 15:04 jayme: depooled kubemaster1001
* 15:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1198']
* 15:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
* 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1198']
* 15:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1197']
* 15:00 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1196.eqiad.wmnet with OS bullseye
* 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003.wikimedia.org
* 14:58 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
* 14:55 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:53 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
* 14:51 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003.wikimedia.org
* 14:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003.wikimedia.org
* 14:49 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1197']
* 14:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1196']
* 14:46 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003.wikimedia.org
* 14:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
* 14:39 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
* 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1196']
* 14:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
* 14:21 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1196']
* 14:18 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:16 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1197']
* 14:15 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003
* 14:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
* 14:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
* 13:57 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1196']
* 13:57 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
* 13:31 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1196']
* 13:31 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
* 13:15 jayme: repooled kubemaster1002
* 13:10 jayme: redepooled kubemaster1002
* 13:00 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 12:56 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MMandere out of all services on: 1235 hosts
* 10:35 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MMandere out of all services on: 1235 hosts
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MMandere out of all services on: 779 hosts
* 10:34 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MMandere out of all services on: 779 hosts
* 10:08 jayme: depooled kubemaster1002 for tests
* 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33743 and previous config saved to /var/cache/conftool/dbconfig/20220902-092704-ladsgroup.json
* 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
* 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
* 08:41 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:37 fnegri@cumin1001: START - Cookbook sre.dns.netbox
* 08:36 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
* 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
* 08:26 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
* 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
* 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
* 08:13 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp2002.wikimedia.org
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
* 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
* 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
* 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
* 07:17 dcausse: restarting blazegraph on wdqs1016 (BlazegraphFreeAllocatorsDecreasingRapidly)
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 to clone db1107 [[phab:T316870|T316870]]', diff saved to https://phabricator.wikimedia.org/P33739 and previous config saved to /var/cache/conftool/dbconfig/20220902-054405-root.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2149 [[phab:T316494|T316494]] ', diff saved to https://phabricator.wikimedia.org/P33738 and previous config saved to /var/cache/conftool/dbconfig/20220902-052841-marostegui.json
 
== 2022-09-01 ==
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:50 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:828616{{!}}Remove Vector grid config (T313559)]], [[gerrit:829043{{!}}Disable sticky header edit experiment for idwiki, viwki (T315264)]] (duration: 05m 44s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:44 thcipriani@deploy1002: thcipriani and cjming and bwang: Backport for [[gerrit:828616{{!}}Remove Vector grid config (T313559)]], [[gerrit:829043{{!}}Disable sticky header edit experiment for idwiki, viwki (T315264)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:44 thcipriani@deploy1002: Started scap: Backport for [[gerrit:828616{{!}}Remove Vector grid config (T313559)]], [[gerrit:829043{{!}}Disable sticky header edit experiment for idwiki, viwki (T315264)]]
* 20:41 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:824787{{!}}cirrus: Handle transition to elasticsearch 7.10]] (duration: 16m 56s)
* 20:40 ryankemper: [[phab:T300943|T300943]] New hosts are in service and were pooled like so: `sudo confctl select name=elastic20[73-86].* set/weight=10:pooled=yes` (in retrospect that syntax seems to have selected too many hosts, but the final state of pybal is correct per https://config-master.wikimedia.org/pybal/codfw/search)
* 20:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1203.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1202.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:37 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic20[73-86].*
* 20:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: [[phab:T300943|T300943]]
* 20:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: [[phab:T300943|T300943]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 andrew@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 20:24 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for [[gerrit:824787{{!}}cirrus: Handle transition to elasticsearch 7.10]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:21 andrew@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 20:24 thcipriani@deploy1002: Started scap: Backport for [[gerrit:824787{{!}}cirrus: Handle transition to elasticsearch 7.10]]
* 20:21 andrew@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 20:20 thcipriani@deploy1002: backport aborted:  (duration: 03m 09s)
* 20:20 andrew@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 20:20 thcipriani@deploy1002: backport aborted:  (duration: 02m 57s)
* 20:19 andrew@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 20:20 thcipriani@deploy1002: sync-world aborted: Backport for [[gerrit:829061{{!}}Revert "Deploy Research Incentive Survey to idwiki"]] (duration: 01m 23s)
* 20:19 andrew@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 20:20 thcipriani@deploy1002: thcipriani and trainbranchbot: Backport for [[gerrit:829061{{!}}Revert "Deploy Research Incentive Survey to idwiki"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:15 andrew@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 20:19 thcipriani@deploy1002: Started scap: Backport for [[gerrit:829061{{!}}Revert "Deploy Research Incentive Survey to idwiki"]]
* 20:15 andrew@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 20:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1203.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29323 and previous config saved to /var/cache/conftool/dbconfig/20220601-201402-ladsgroup.json
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:14 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1202.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:09 SandraEbele: Successfully deployed refinery using scap, then deployed onto hdfs.
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:42 ebysans@deploy1002: Finished deploy [analytics/refinery@13f791b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@13f791b] (duration: 07m 06s)
* 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1201.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:35 ebysans@deploy1002: Started deploy [analytics/refinery@13f791b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@13f791b]
* 20:13 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1200.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:35 ebysans@deploy1002: Finished deploy [analytics/refinery@13f791b] (thin): Regular analytics weekly train THIN [analytics/refinery@13f791b] (duration: 00m 07s)
* 20:13 thcipriani@deploy1002: thcipriani and dani: Backport for [[gerrit:828614{{!}}Deploy Research Incentive Survey to idwiki (T316466)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:35 ebysans@deploy1002: Started deploy [analytics/refinery@13f791b] (thin): Regular analytics weekly train THIN [analytics/refinery@13f791b]
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:19 ebysans@deploy1002: Finished deploy [analytics/refinery@13f791b]: Regular analytics weekly train [analytics/refinery@13f791b] (duration: 23m 12s)
* 20:06 thcipriani@deploy1002: Started scap: Backport for [[gerrit:828614{{!}}Deploy Research Incentive Survey to idwiki (T316466)]]
* 18:56 ebysans@deploy1002: Started deploy [analytics/refinery@13f791b]: Regular analytics weekly train [analytics/refinery@13f791b]
* 19:58 mutante: otrs1001 - sudo systemctl reset-failed - [[phab:T316903|T316903]]
* 18:52 SandraEbele: About to deploy analytics/refinery (weekly deployment train)
* 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1201.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:14 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.14  refs [[phab:T308067|T308067]] (duration: 03m 02s)
* 19:46 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1200.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1199.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.14  refs [[phab:T308067|T308067]]
* 19:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1198.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1199.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:17 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1198.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1197.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:05 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.14/extensions/Wikibase/client: Backport: [[gerrit:802114{{!}}Don't call saveSettings in EchoNotificationsHandlers::doLocalUserCreated (T306636)]] (duration: 03m 11s)
* 19:16 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1196.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1197.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:53 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1196.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:52 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1047.eqiad.wmnet with OS bullseye
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1203
* 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1047.eqiad.wmnet with reason: host reimage
* 18:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1203
* 15:40 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1047.eqiad.wmnet with reason: host reimage
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1202
* 15:24 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1047.eqiad.wmnet with OS bullseye
* 18:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1202
* 15:02 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.14/extensions/Thanks/includes/Hooks.php: Backport: [[gerrit:802110{{!}}Don't call saveOptions in Hooks::onAccountCreated (T306636)]] (duration: 03m 10s)
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1201
* 15:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1201
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1200
* 15:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1200
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1199
* 14:55 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.13/extensions/Thanks/includes/Hooks.php: Backport: [[gerrit:802109{{!}}Don't call saveOptions in Hooks::onAccountCreated (T306636)]] (duration: 03m 10s)
* 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1199
* 14:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1198
* 14:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1198
* 14:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1197
* 14:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1197
* 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1046.eqiad.wmnet with OS bullseye
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1196
* 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1046.eqiad.wmnet with reason: host reimage
* 18:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1196
* 14:28 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1046.eqiad.wmnet with reason: host reimage
* 18:48 pt1979@cumin1001: START - Cookbook sre.dns.netbox
* 14:25 aikochou@deploy1002: Finished deploy [ores/deploy@3d541df]: Deploy revscoring 2.11.4 to ORES - [[phab:T309536|T309536]] (duration: 45m 07s)
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:10 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1046.eqiad.wmnet with OS bullseye
* 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:40 aikochou@deploy1002: Started deploy [ores/deploy@3d541df]: Deploy revscoring 2.11.4 to ORES - [[phab:T309536|T309536]]
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 18:42 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.27  refs [[phab:T314188|T314188]]
* 13:32 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 17:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 12:41 moritzm: installing ruby-nokogiri security updates
* 17:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T309617|T309617]])', diff saved to https://phabricator.wikimedia.org/P29320 and previous config saved to /var/cache/conftool/dbconfig/20220601-122426-ladsgroup.json
* 17:33 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P29318 and previous config saved to /var/cache/conftool/dbconfig/20220601-120921-ladsgroup.json
* 17:32 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P29317 and previous config saved to /var/cache/conftool/dbconfig/20220601-115416-ladsgroup.json
* 17:32 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1137 in x1 with minimal weight to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29315 and previous config saved to /var/cache/conftool/dbconfig/20220601-114418-marostegui.json
* 17:31 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T309617|T309617]])', diff saved to https://phabricator.wikimedia.org/P29314 and previous config saved to /var/cache/conftool/dbconfig/20220601-113911-ladsgroup.json
* 17:26 herron: restarted rsyslog on centrallog2002
* 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T309617|T309617]])', diff saved to https://phabricator.wikimedia.org/P29313 and previous config saved to /var/cache/conftool/dbconfig/20220601-113017-ladsgroup.json
* 16:29 topranks: Brining Lumen Tranport CCT {{Gerrit|442550294}} (cr1-codfw to cr4-ulsfo) back into service following successful hot-cut to lower-latency path with carrier
* 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 16:17 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase103[1-3].eqiad.wmnet
* 11:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 15:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase103[1-3].eqiad.wmnet
* 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:21 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.13/extensions/PageTriage/includes/Hooks.php: Backport: [[gerrit:802107{{!}}Don't call saveOptions in LocalUserCreated (T306636)]] (duration: 03m 16s)
* 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:21 moritzm: installing usb.ids update from Bullseye 11.4 point release
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1137 in x1 with minimal weight to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29312 and previous config saved to /var/cache/conftool/dbconfig/20220601-111805-marostegui.json
* 15:19 moritzm: updating docker.io on ml-serve* to bugfix release from Bullseye 11.4 point release
* 11:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1045.eqiad.wmnet with OS bullseye
* 14:54 topranks: Draining traffic from Lumen Tranport CCT {{Gerrit|442550294}} (cr1-codfw to cr4-ulsfo) ahead of hot-cut to lower-latency path with carrier
* 11:15 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.14/extensions/PageTriage/includes/Hooks.php: Backport: [[gerrit:802106{{!}}Don't call saveOptions in LocalUserCreated (T306636)]] (duration: 03m 01s)
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:07 moritzm: installing net-snmp security updates on Buster
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
* 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 marostegui: test [[phab:T316744|T316744]]
* 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:01 marostegui: test [[phab:T316744|T316744]]
* 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:00 marostegui: Failover m5 from db1107 to db1183 - [[phab:T316744|T316744]]
* 10:54 XioNoX: upgrade fastnetmon to 1.2.1 in eqsin - [[phab:T271228|T271228]]
* 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
* 10:51 XioNoX: upgrade fastnetmon to 1.2.1 in esams - [[phab:T271228|T271228]]
* 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
* 10:49 XioNoX: upgrade fastnetmon to 1.2.1 in eqiad - [[phab:T271228|T271228]]
* 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
* 10:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1045.eqiad.wmnet with reason: host reimage
* 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox1002.eqiad.wmnet
* 10:45 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1045.eqiad.wmnet with reason: host reimage
* 13:43 moritzm: rebooting netbox1002 (running netbox.wikimedia.org)
* 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1045.eqiad.wmnet with OS bullseye
* 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
* 10:13 moritzm: installing openldap security updates
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
* 10:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1044.eqiad.wmnet with OS bullseye
* 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
* 09:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1044.eqiad.wmnet with reason: host reimage
* 13:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1107,1117,1183].eqiad.wmnet with reason: switchover m5 [[phab:T316744|T316744]]
* 09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1107,1117,1183].eqiad.wmnet with reason: switchover m5 [[phab:T316744|T316744]]
* 09:36 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1044.eqiad.wmnet with OS bullseye
* 13:19 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1043.eqiad.wmnet with OS bullseye
* 13:19 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1137 in x1 with minimal weight to test 10.6.8 [[phab:T309679|T309679]] ', diff saved to https://phabricator.wikimedia.org/P29307 and previous config saved to /var/cache/conftool/dbconfig/20220601-085620-marostegui.json
* 13:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:49 moritzm: installing idp1002 [[phab:T308214|T308214]]
* 13:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1043.eqiad.wmnet with reason: host reimage
* 13:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:45 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1043.eqiad.wmnet with reason: host reimage
* 13:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:39 elukey: powercycle an-worker1094 - OEM event registered in `racadm getsel`, host frozen
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:30 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1043.eqiad.wmnet with OS bullseye
* 13:09 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:823677{{!}}Move 5% of traffic to php 7.4 (T271736)]] (duration: 03m 45s)
* 08:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1043.eqiad.wmnet with OS bullseye
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:20 moritzm: installing openssl security updates
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:19 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1043.eqiad.wmnet with OS bullseye
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:00 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to x1 master', diff saved to https://phabricator.wikimedia.org/P29306 and previous config saved to /var/cache/conftool/dbconfig/20220601-081130-marostegui.json
* 13:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for migration to 10.6 [[phab:T309679|T309679]]', diff saved to https://phabricator.wikimedia.org/P29305 and previous config saved to /var/cache/conftool/dbconfig/20220601-081044-root.json
* 12:59 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:00 moritzm: installing idp2002 [[phab:T308214|T308214]]
* 12:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 07:34 moritzm: installing libxml2 security updates
* 12:56 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T60674|T60674]])', diff saved to https://phabricator.wikimedia.org/P29301 and previous config saved to /var/cache/conftool/dbconfig/20220601-031406-ladsgroup.json
* 12:29 herron: restarted thanos-query on thanos-fe1001
* 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P29300 and previous config saved to /var/cache/conftool/dbconfig/20220601-025901-ladsgroup.json
* 12:20 cdanis@cumin2002: dbctl commit (dc=all): '[[phab:T316482|T316482]] remove replicas from x2', diff saved to https://phabricator.wikimedia.org/P33736 and previous config saved to /var/cache/conftool/dbconfig/20220901-122026-cdanis.json
* 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P29299 and previous config saved to /var/cache/conftool/dbconfig/20220601-024356-ladsgroup.json
* 12:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve-ctrl1001.eqiad.wmnet
* 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T60674|T60674]])', diff saved to https://phabricator.wikimedia.org/P29298 and previous config saved to /var/cache/conftool/dbconfig/20220601-022851-ladsgroup.json
* 12:13 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-serve-ctrl1001.eqiad.wmnet
* 02:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T60674|T60674]])', diff saved to https://phabricator.wikimedia.org/P29297 and previous config saved to /var/cache/conftool/dbconfig/20220601-020339-ladsgroup.json
* 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33735 and previous config saved to /var/cache/conftool/dbconfig/20220901-121252-ladsgroup.json
* 01:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Reboot to pick up kernel 5.10.136 ([[phab:T316185|T316185]])
* 01:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:05 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Reboot to pick up kernel 5.10.136 ([[phab:T316185|T316185]])
* 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T60674|T60674]])', diff saved to https://phabricator.wikimedia.org/P29296 and previous config saved to /var/cache/conftool/dbconfig/20220601-014053-ladsgroup.json
* 12:03 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P29295 and previous config saved to /var/cache/conftool/dbconfig/20220601-012548-ladsgroup.json
* 11:59 moritzm: rebalance row B after completed Bullseye updates [[phab:T311686|T311686]]
* 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P29294 and previous config saved to /var/cache/conftool/dbconfig/20220601-011043-ladsgroup.json
* 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33734 and previous config saved to /var/cache/conftool/dbconfig/20220901-115746-ladsgroup.json
* 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T60674|T60674]])', diff saved to https://phabricator.wikimedia.org/P29293 and previous config saved to /var/cache/conftool/dbconfig/20220601-005537-ladsgroup.json
* 11:48 cdanis: root@apt1001:/home/cdanis/build-area# reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia conftool_2.2.2-1_amd64.changes
* 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P29292 and previous config saved to /var/cache/conftool/dbconfig/20220601-002739-ladsgroup.json
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33733 and previous config saved to /var/cache/conftool/dbconfig/20220901-114239-ladsgroup.json
* 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T60674|T60674]])', diff saved to https://phabricator.wikimedia.org/P29291 and previous config saved to /var/cache/conftool/dbconfig/20220601-002701-ladsgroup.json
* 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33732 and previous config saved to /var/cache/conftool/dbconfig/20220901-112733-ladsgroup.json
* 00:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 11:04 claime: depooled wtp1035.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
* 00:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2002.codfw.wmnet
* 00:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29290 and previous config saved to /var/cache/conftool/dbconfig/20220601-001234-ladsgroup.json
* 10:58 claime: pooled parse1002.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 10:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 10:56 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T60674|T60674]])', diff saved to https://phabricator.wikimedia.org/P29289 and previous config saved to /var/cache/conftool/dbconfig/20220601-000448-ladsgroup.json
* 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki2002.codfw.wmnet
* 10:43 claime: depooled wtp1034.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
* 10:43 claime: pooled parse1001.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
* 10:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1001.eqiad.wmnet
* 10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1001.eqiad.wmnet
* 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
* 10:36 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1002.eqiad.wmnet
* 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 10:29 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
* 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 10:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:13 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1013 backt to pc3 master (duration: 03m 43s)
* 10:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:58 cgoubert@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:828786{{!}}Update wgLinterSubmitterWhitelist (T312638)]] (duration: 03m 37s)
* 09:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:32 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc3 master (duration: 03m 34s)
* 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2015.codfw.wmnet to cluster codfw and group D
* 08:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Readding downtime removed by reimage
* 08:17 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Readding downtime removed by reimage
* 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2015.codfw.wmnet to cluster codfw and group D
* 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
* 07:56 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of traffic to php 7.4 (duration: 03m 42s)
* 07:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2015.codfw.wmnet with OS bullseye
* 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2015.codfw.wmnet with reason: host reimage
* 07:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2015.codfw.wmnet with reason: host reimage
* 06:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2015.codfw.wmnet with OS bullseye
* 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:25 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Reverting to no php 7.4 traffic (duration: 03m 44s)
* 06:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:10 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of users to php 7.4 (duration: 03m 55s)
* 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1136 [[phab:T316111|T316111]]', diff saved to https://phabricator.wikimedia.org/P33729 and previous config saved to /var/cache/conftool/dbconfig/20220901-060923-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write [[phab:T316111|T316111]]', diff saved to https://phabricator.wikimedia.org/P33728 and previous config saved to /var/cache/conftool/dbconfig/20220901-060128-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - [[phab:T316111|T316111]]', diff saved to https://phabricator.wikimedia.org/P33727 and previous config saved to /var/cache/conftool/dbconfig/20220901-060100-ladsgroup.json
* 06:00 Amir1: Starting s7 eqiad failover from db1136 to db1181 - [[phab:T316111|T316111]]
* 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1181 with weight 0 [[phab:T316111|T316111]]', diff saved to https://phabricator.wikimedia.org/P33726 and previous config saved to /var/cache/conftool/dbconfig/20220901-051701-ladsgroup.json
* 05:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T316111|T316111]]
* 05:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T316111|T316111]]
* 01:20 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase201[3-8].codfw.wmnet: Restart to apply new certificates ([[phab:T316697|T316697]]) - eevans@cumin1001
* 00:21 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase201[3-8].codfw.wmnet: Restart to apply new certificates ([[phab:T316697|T316697]]) - eevans@cumin1001


==Archives==
==Archives ==
See [[Server Admin Log/Archives]].
See [[Server Admin Log/Archives]].
<noinclude>
<noinclude>

Latest revision as of 23:26, 30 September 2022

2022-09-30

  • 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T314041)', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
  • 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T314041)', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
  • 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
  • 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
  • 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
  • 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T314041)', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
  • 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T314041)', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
  • 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T314041)', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
  • 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 13:51 moritzm: installing puppetdb-test2001 T318931
  • 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
  • 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T314041)', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T314041)', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T314041)', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
  • 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
  • 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
  • 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
  • 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
  • 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
  • 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
  • 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
  • 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T314041)', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
  • 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T314041)', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
  • 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
  • 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T314041)', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
  • 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
  • 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye

2022-09-29

  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T314041)', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T314041)', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T314041)', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:43 sukhe: alert1001: restart icinga
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:26 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:18 ejegg: payments-wiki upgraded from 839d6dde to aeee9676
  • 21:14 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:14 brennen: end of utc late backport and config window
  • 21:14 brennen@deploy1002: Finished scap: Backport for cirrus: Don't configure cloud clusters for private wikis (duration: 08m 22s)
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:06 brennen@deploy1002: brennen and ebernhardson: Backport for cirrus: Don't configure cloud clusters for private wikis synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:05 brennen@deploy1002: Started scap: Backport for cirrus: Don't configure cloud clusters for private wikis
  • 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:59 ryankemper: T313431 Repooled `elastic[2073-2074,2080-2081,2083,2086].codfw.wmnet`. Codfw's all on 5 masters now and cluster is back to green.
  • 20:58 brennen@deploy1002: Sync cancelled.
  • 20:58 brennen@deploy1002: brennen and trainbranchbot: Backport for Revert "cirrus: Don't configure cloud clusters for private wikis" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:58 ryankemper: T313431 Updated cross-cluster seed conf with new masters; should resolve the settings check alerts
  • 20:58 brennen@deploy1002: Started scap: Backport for Revert "cirrus: Don't configure cloud clusters for private wikis"
  • 20:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4027.ulsfo.wmnet
  • 20:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:52 brennen@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.3" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.gcoIZ0BTKW"' returned non-zero exit status 255. (duration: 00m 00s)
  • 20:52 brennen@deploy1002: Started scap: Backport for cirrus: Don't configure cloud clusters for private wikis
  • 20:49 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:46 brennen@deploy1002: Sync cancelled.
  • 20:45 brennen@deploy1002: brennen and trainbranchbot: Backport for Revert "Add Nepalese Wikipedia tagline" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:45 brennen@deploy1002: Started scap: Backport for Revert "Add Nepalese Wikipedia tagline"
  • 20:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 20:42 brennen@deploy1002: Sync cancelled.
  • 20:41 brennen@deploy1002: brennen and jdlrobson: Backport for Add Nepalese Wikipedia tagline (T318737) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:41 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic2080` to pick up new master-eligible status
  • 20:41 brennen@deploy1002: Started scap: Backport for Add Nepalese Wikipedia tagline (T318737)
  • 20:38 brennen@deploy1002: Finished scap: Backport for Enable desktop improvements on nowikimedia (T318344) (duration: 08m 03s)
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:35 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
  • 20:35 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cp4027.ulsfo.wmnet
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:33 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
  • 20:30 brennen@deploy1002: brennen and jdlrobson: Backport for Enable desktop improvements on nowikimedia (T318344) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:30 brennen@deploy1002: Started scap: Backport for Enable desktop improvements on nowikimedia (T318344)
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 brennen@deploy1002: Finished scap: Backport for Web team config cleanup (T316568) (duration: 08m 05s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 hoo: Ran foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php
  • 20:17 ejegg: payments-wiki upgraded from 0456850e to 839d6dde (with cache prefix altered for moved classes)
  • 20:17 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic2086` to pick up new master-eligible status
  • 20:17 brennen@deploy1002: brennen and jdlrobson: Backport for Web team config cleanup (T316568) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:17 brennen@deploy1002: Started scap: Backport for Web team config cleanup (T316568)
  • 20:04 ejegg: payments-wiki rolled back from 839d6dde to 0456850e
  • 19:56 ejegg: payments-wiki upgraded from 0456850e to 839d6dde
  • 19:55 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic208[1,3]` to pick up new master-eligible status
  • 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 19:33 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic207[3,4]` to pick up new master-eligible status
  • 19:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: T313431
  • 19:29 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: T313431
  • 19:09 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4021.ulsfo.wmnet
  • 19:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:04 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4021.ulsfo.wmnet
  • 18:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.3 refs T314192
  • 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T314041)', diff saved to https://phabricator.wikimedia.org/P35188 and previous config saved to /var/cache/conftool/dbconfig/20220929-162812-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T314041)', diff saved to https://phabricator.wikimedia.org/P35187 and previous config saved to /var/cache/conftool/dbconfig/20220929-162750-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35186 and previous config saved to /var/cache/conftool/dbconfig/20220929-161244-ladsgroup.json
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35185 and previous config saved to /var/cache/conftool/dbconfig/20220929-155737-ladsgroup.json
  • 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Configure `mul` Wikibase language code on Beta wikis (beta-only, prod noop) (duration: 03m 41s)
  • 15:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T314041)', diff saved to https://phabricator.wikimedia.org/P35184 and previous config saved to /var/cache/conftool/dbconfig/20220929-154231-ladsgroup.json
  • 15:35 dancy@deploy1002: Installation of scap version "4.25.0" completed for 561 hosts
  • 15:35 dancy@deploy1002: Installing scap version "4.25.0" for 561 hosts
  • 14:30 moritzm: installing glib2.0 security updates
  • 14:29 moritzm: uploaded glib2.0 2.50.3-2+deb9u3+wmf1 to apt.wikimedia.org/stretch-wikimedia
  • 14:17 moritzm: rolling restart of apache2 in mw/eqiad to pick up Expat security updates
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11164
  • 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11164
  • 13:54 claime: Enabled puppet for C:memcache hosts following merge C:memcached Fix memcached bootstrap
  • 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 32934
  • 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35179 and previous config saved to /var/cache/conftool/dbconfig/20220929-134844-root.json
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:46 claime: Disabling puppet for C:memcache hosts to merge C:memcached Fix memcached bootstrap
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:41 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 13:41 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Wikibase: Set UnconnectedPage page prop format for test wikis (duration: 06m 13s)
  • 13:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
  • 13:39 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
  • 13:35 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and hoo: Backport for Wikibase: Set UnconnectedPage page prop format for test wikis synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:34 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Wikibase: Set UnconnectedPage page prop format for test wikis
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35178 and previous config saved to /var/cache/conftool/dbconfig/20220929-133339-root.json
  • 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Stop mobile visual enhancements from rolling out to jawiki (T318871) (duration: 05m 36s)
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kemayo: Backport for Stop mobile visual enhancements from rolling out to jawiki (T318871) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Stop mobile visual enhancements from rolling out to jawiki (T318871)
  • 13:26 moritzm: restartting Apache on lists
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384) (duration: 05m 23s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35176 and previous config saved to /var/cache/conftool/dbconfig/20220929-131834-root.json
  • 13:15 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:15 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35175 and previous config saved to /var/cache/conftool/dbconfig/20220929-131507-root.json
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 moritzm: rolling restart of apache2 in mw/codfw to pick up Expat security updates
  • 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: votewiki: Change wgLanguageCode to zh for Sep 2022 admins election (T318147) (duration: 03m 40s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35174 and previous config saved to /var/cache/conftool/dbconfig/20220929-130329-root.json
  • 13:01 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3 refs T314192 (duration: 04m 04s)
  • 13:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35173 and previous config saved to /var/cache/conftool/dbconfig/20220929-130003-root.json
  • 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:57 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3 refs T314192
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35172 and previous config saved to /var/cache/conftool/dbconfig/20220929-124824-root.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35171 and previous config saved to /var/cache/conftool/dbconfig/20220929-124458-root.json
  • 12:44 ladsgroup@deploy1002: Finished scap: Backport for Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904) (duration: 09m 05s)
  • 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:35 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 12:34 ladsgroup@deploy1002: Started scap: Backport for Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35169 and previous config saved to /var/cache/conftool/dbconfig/20220929-123319-root.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35168 and previous config saved to /var/cache/conftool/dbconfig/20220929-122953-root.json
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35167 and previous config saved to /var/cache/conftool/dbconfig/20220929-121814-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35166 and previous config saved to /var/cache/conftool/dbconfig/20220929-121448-root.json
  • 12:10 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3292
  • 12:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3292
  • 12:04 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35165 and previous config saved to /var/cache/conftool/dbconfig/20220929-120309-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35164 and previous config saved to /var/cache/conftool/dbconfig/20220929-115943-root.json
  • 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
  • 11:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P35163 and previous config saved to /var/cache/conftool/dbconfig/20220929-115612-root.json
  • 11:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 209453
  • 11:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 209453
  • 11:51 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15695
  • 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15695
  • 11:45 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 42
  • 11:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
  • 11:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3856
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35162 and previous config saved to /var/cache/conftool/dbconfig/20220929-114438-root.json
  • 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T314041)', diff saved to https://phabricator.wikimedia.org/P35161 and previous config saved to /var/cache/conftool/dbconfig/20220929-114431-ladsgroup.json
  • 11:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3856
  • 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42
  • 11:41 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:40 ladsgroup@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
  • 11:39 ladsgroup@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62955
  • 11:38 ladsgroup@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:38 ladsgroup@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62955
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35160 and previous config saved to /var/cache/conftool/dbconfig/20220929-112933-root.json
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35159 and previous config saved to /var/cache/conftool/dbconfig/20220929-112925-ladsgroup.json
  • 11:16 XioNoX: re-pool cr2-eqord - T295690
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35158 and previous config saved to /var/cache/conftool/dbconfig/20220929-111418-ladsgroup.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T318892', diff saved to https://phabricator.wikimedia.org/P35157 and previous config saved to /var/cache/conftool/dbconfig/20220929-111217-root.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 codfw primary T318892', diff saved to https://phabricator.wikimedia.org/P35156 and previous config saved to /var/cache/conftool/dbconfig/20220929-111127-root.json
  • 11:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - T318892
  • 11:06 XioNoX: restart cr2-eqord for upgrade - T295690
  • 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 11:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 11:01 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T314041)', diff saved to https://phabricator.wikimedia.org/P35155 and previous config saved to /var/cache/conftool/dbconfig/20220929-105912-ladsgroup.json
  • 10:53 XioNoX: drain cr2-eqord - T295690
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T318892', diff saved to https://phabricator.wikimedia.org/P35154 and previous config saved to /var/cache/conftool/dbconfig/20220929-105206-root.json
  • 10:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s8 T318892
  • 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s8 T318892
  • 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318892
  • 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
  • 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318892
  • 10:50 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
  • 10:40 XioNoX: repool cr2-eqiad - T295690
  • 10:36 moritzm: installing poppler security updates
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T314041)', diff saved to https://phabricator.wikimedia.org/P35153 and previous config saved to /var/cache/conftool/dbconfig/20220929-100849-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T314041)', diff saved to https://phabricator.wikimedia.org/P35152 and previous config saved to /var/cache/conftool/dbconfig/20220929-100828-ladsgroup.json
  • 10:07 XioNoX: second (and longest) cr2-eqiad RE switchover - T295690
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35150 and previous config saved to /var/cache/conftool/dbconfig/20220929-095321-ladsgroup.json
  • 09:45 moritzm: restarting superset to pick up expat security update
  • 09:43 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:42 XioNoX: first cr2-eqiad RE switchover - T295690
  • 09:41 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:38 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35149 and previous config saved to /var/cache/conftool/dbconfig/20220929-093815-ladsgroup.json
  • 09:36 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:34 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:33 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:33 XioNoX: drain cr2-eqiad - T295690
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
  • 09:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
  • 09:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:26 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2098.codfw.wmnet with OS bullseye
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T314041)', diff saved to https://phabricator.wikimedia.org/P35148 and previous config saved to /var/cache/conftool/dbconfig/20220929-092308-ladsgroup.json
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:16 XioNoX: repool cr1-eqiad - T295690
  • 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.3"
  • 09:07 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 09:04 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 08:52 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2098.codfw.wmnet with OS bullseye
  • 08:43 XioNoX: second cr1-eqiad RE switchover - T295690
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35146 and previous config saved to /var/cache/conftool/dbconfig/20220929-082757-root.json
  • 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:15 XioNoX: first cr1-eqiad RE switchover (for NVM firmware) - T295690
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35145 and previous config saved to /var/cache/conftool/dbconfig/20220929-081252-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35144 and previous config saved to /var/cache/conftool/dbconfig/20220929-080340-root.json
  • 07:57 XioNoX: drain traffic away from cr1-eqiad - T295690
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35143 and previous config saved to /var/cache/conftool/dbconfig/20220929-075747-root.json
  • 07:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
  • 07:49 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35142 and previous config saved to /var/cache/conftool/dbconfig/20220929-074835-root.json
  • 07:45 moritzm: installing expat security updates
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35141 and previous config saved to /var/cache/conftool/dbconfig/20220929-074242-root.json
  • 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 18106
  • 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 18106
  • 07:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38040
  • 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38040
  • 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35280
  • 07:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35280
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35140 and previous config saved to /var/cache/conftool/dbconfig/20220929-073330-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35139 and previous config saved to /var/cache/conftool/dbconfig/20220929-072745-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35138 and previous config saved to /var/cache/conftool/dbconfig/20220929-072737-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35137 and previous config saved to /var/cache/conftool/dbconfig/20220929-071825-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35136 and previous config saved to /var/cache/conftool/dbconfig/20220929-071240-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35135 and previous config saved to /var/cache/conftool/dbconfig/20220929-071232-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35134 and previous config saved to /var/cache/conftool/dbconfig/20220929-070320-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35133 and previous config saved to /var/cache/conftool/dbconfig/20220929-065736-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35132 and previous config saved to /var/cache/conftool/dbconfig/20220929-065727-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35131 and previous config saved to /var/cache/conftool/dbconfig/20220929-064815-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35130 and previous config saved to /var/cache/conftool/dbconfig/20220929-064231-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35129 and previous config saved to /var/cache/conftool/dbconfig/20220929-064222-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P35128 and previous config saved to /var/cache/conftool/dbconfig/20220929-063508-root.json
  • 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35127 and previous config saved to /var/cache/conftool/dbconfig/20220929-063310-root.json
  • 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35126 and previous config saved to /var/cache/conftool/dbconfig/20220929-062726-root.json
  • 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35125 and previous config saved to /var/cache/conftool/dbconfig/20220929-061805-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35124 and previous config saved to /var/cache/conftool/dbconfig/20220929-061221-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T318888', diff saved to https://phabricator.wikimedia.org/P35123 and previous config saved to /var/cache/conftool/dbconfig/20220929-060532-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary and set section read-write T318888', diff saved to https://phabricator.wikimedia.org/P35122 and previous config saved to /var/cache/conftool/dbconfig/20220929-060425-root.json
  • 06:03 marostegui: Starting s7 codfw failover from db2121 to db2118 - T318888
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35121 and previous config saved to /var/cache/conftool/dbconfig/20220929-055716-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2118 from API T318888', diff saved to https://phabricator.wikimedia.org/P35120 and previous config saved to /var/cache/conftool/dbconfig/20220929-054542-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 T318888', diff saved to https://phabricator.wikimedia.org/P35119 and previous config saved to /var/cache/conftool/dbconfig/20220929-054509-root.json
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318888
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318888
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35118 and previous config saved to /var/cache/conftool/dbconfig/20220929-054211-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 from API T318886', diff saved to https://phabricator.wikimedia.org/P35117 and previous config saved to /var/cache/conftool/dbconfig/20220929-053951-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 T318886', diff saved to https://phabricator.wikimedia.org/P35116 and previous config saved to /var/cache/conftool/dbconfig/20220929-053407-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write T318886', diff saved to https://phabricator.wikimedia.org/P35115 and previous config saved to /var/cache/conftool/dbconfig/20220929-053302-root.json
  • 05:32 marostegui: Starting s4 codfw failover from db2110 to db2140 - T318886
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T314041)', diff saved to https://phabricator.wikimedia.org/P35114 and previous config saved to /var/cache/conftool/dbconfig/20220929-052805-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T314041)', diff saved to https://phabricator.wikimedia.org/P35113 and previous config saved to /var/cache/conftool/dbconfig/20220929-052743-ladsgroup.json
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35112 and previous config saved to /var/cache/conftool/dbconfig/20220929-051237-ladsgroup.json
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T318886
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T318886', diff saved to https://phabricator.wikimedia.org/P35111 and previous config saved to /var/cache/conftool/dbconfig/20220929-051114-root.json
  • 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T318886
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35110 and previous config saved to /var/cache/conftool/dbconfig/20220929-045730-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T314041)', diff saved to https://phabricator.wikimedia.org/P35109 and previous config saved to /var/cache/conftool/dbconfig/20220929-044224-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T314041)', diff saved to https://phabricator.wikimedia.org/P35108 and previous config saved to /var/cache/conftool/dbconfig/20220929-035724-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35107 and previous config saved to /var/cache/conftool/dbconfig/20220929-035647-ladsgroup.json
  • 03:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35106 and previous config saved to /var/cache/conftool/dbconfig/20220929-034140-ladsgroup.json
  • 03:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
  • 03:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35105 and previous config saved to /var/cache/conftool/dbconfig/20220929-032634-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35104 and previous config saved to /var/cache/conftool/dbconfig/20220929-031127-ladsgroup.json
  • 02:29 ejegg: updated fundraising CiviCRM from f3461a44 to 5e1738a1
  • 02:20 ejegg: updated fundraising python tools from dd494413 to 14d60435
  • 01:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS buster
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage

2022-09-28

  • 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
  • 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
  • 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T314041)', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 22:20 ejegg: updated fundraising CiviCRM from d31c19a0 to f3461a44
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
  • 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35100 and previous config saved to /var/cache/conftool/dbconfig/20220928-212131-ladsgroup.json
  • 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35099 and previous config saved to /var/cache/conftool/dbconfig/20220928-210624-ladsgroup.json
  • 21:06 volans: installed spicerack 4.0.0-1+deb11u1 on cumin1001
  • 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35098 and previous config saved to /var/cache/conftool/dbconfig/20220928-205117-ladsgroup.json
  • 20:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
  • 20:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
  • 20:39 TheresNoTime: closing UTC late backport window
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 samtar@deploy1002: Finished scap: Backport for [config]: Deploy GDI survey Wave 3 (T318156) (duration: 06m 19s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:18 samtar@deploy1002: samtar and essexigyan: Backport for [config]: Deploy GDI survey Wave 3 (T318156) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:18 samtar@deploy1002: Started scap: Backport for [config]: Deploy GDI survey Wave 3 (T318156)
  • 20:11 samtar@deploy1002: Sync cancelled.
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:04 samtar@deploy1002: samtar and dani: Backport for Deploy Research Incentive survey on arwiki (T318328) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:04 samtar@deploy1002: Started scap: Backport for Deploy Research Incentive survey on arwiki (T318328)
  • 19:24 ejegg: updated fundraising CiviCRM from 916a8b08 to d31c19a0
  • 19:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:22 volans: installed spicerack 4.0.0-1+deb11u1 on cumin2002
  • 18:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3f23a1b]: (no justification provided) (duration: 00m 11s)
  • 18:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@3f23a1b]: (no justification provided)
  • 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3 refs T314192 (duration: 03m 38s)
  • 18:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3 refs T314192
  • 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19653
  • 17:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19653
  • 17:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
  • 17:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
  • 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4181
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4181
  • 17:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T314041)', diff saved to https://phabricator.wikimedia.org/P35097 and previous config saved to /var/cache/conftool/dbconfig/20220928-171848-ladsgroup.json
  • 17:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35096 and previous config saved to /var/cache/conftool/dbconfig/20220928-170342-ladsgroup.json
  • 16:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35095 and previous config saved to /var/cache/conftool/dbconfig/20220928-164835-ladsgroup.json
  • 16:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 16:36 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@f89d689]: (no justification provided) (duration: 00m 12s)
  • 16:36 nokafor@deploy1002: Started deploy [airflow-dags/analytics@f89d689]: (no justification provided)
  • 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T314041)', diff saved to https://phabricator.wikimedia.org/P35093 and previous config saved to /var/cache/conftool/dbconfig/20220928-163329-ladsgroup.json
  • 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
  • 16:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4775
  • 16:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4775
  • 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2635
  • 16:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2635
  • 16:15 volans: uploaded spicerack_4.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 15:57 dancy@deploy1002: Installation of scap version "4.24.0" completed for 561 hosts
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 15:57 dancy@deploy1002: Installing scap version "4.24.0" for 561 hosts
  • 15:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40217
  • 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40217
  • 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
  • 15:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
  • 15:51 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0646be1]: (no justification provided) (duration: 00m 10s)
  • 15:51 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0646be1]: (no justification provided)
  • 15:47 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS buster
  • 15:26 moritzm: installing libgoogle-gson-java security updates on bullseye
  • 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4922
  • 15:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4922
  • 15:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
  • 15:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 15:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
  • 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19108
  • 15:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19108
  • 15:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 15:09 moritzm: installing twisted security updates
  • 15:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8674
  • 15:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8674
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35092 and previous config saved to /var/cache/conftool/dbconfig/20220928-150230-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T314041)', diff saved to https://phabricator.wikimedia.org/P35091 and previous config saved to /var/cache/conftool/dbconfig/20220928-150158-ladsgroup.json
  • 15:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 15:00 SandraEbele: deploying Airflow for hdfsarchiver operator fix
  • 15:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@aa7984f]: (no justification provided) (duration: 00m 14s)
  • 15:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@aa7984f]: (no justification provided)
  • 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1005.eqiad.wmnet with OS bullseye
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
  • 14:53 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394354
  • 14:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 394354
  • 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393950
  • 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393950
  • 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262589
  • 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 262589
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
  • 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209453
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
  • 14:48 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
  • 14:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 65517
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 65517
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62955
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62955
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 57695
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 57695
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53334
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35090 and previous config saved to /var/cache/conftool/dbconfig/20220928-144651-ladsgroup.json
  • 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 53334
  • 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52320
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46450
  • 14:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46450
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
  • 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
  • 14:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
  • 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
  • 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
  • 14:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
  • 14:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
  • 14:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
  • 14:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32934
  • 14:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32934
  • 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
  • 14:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32787
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
  • 14:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
  • 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
  • 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29791
  • 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
  • 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26744
  • 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
  • 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22987
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35089 and previous config saved to /var/cache/conftool/dbconfig/20220928-143145-ladsgroup.json
  • 14:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22987
  • 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22773
  • 14:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22773
  • 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22616
  • 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22616
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21949
  • 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 14:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1005.eqiad.wmnet with OS bullseye
  • 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21949
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21928
  • 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21928
  • 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20115
  • 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20115
  • 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19653
  • 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19653
  • 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
  • 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19151
  • 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19108
  • 14:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 14:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19108
  • 14:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
  • 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
  • 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16735
  • 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16735
  • 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15695
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15695
  • 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
  • 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
  • 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14630
  • 14:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14630
  • 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14361
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14361
  • 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13760
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13760
  • 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13489
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13489
  • 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13335
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T314041)', diff saved to https://phabricator.wikimedia.org/P35088 and previous config saved to /var/cache/conftool/dbconfig/20220928-141638-ladsgroup.json
  • 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13335
  • 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
  • 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
  • 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12041
  • 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12041
  • 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
  • 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11164
  • 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11039
  • 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11039
  • 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
  • 14:12 volans: added python3-gjson v0.0.5 to apt.w.o (bullseye only)
  • 14:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10310
  • 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 14:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8781
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35087 and previous config saved to /var/cache/conftool/dbconfig/20220928-141007-root.json
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35086 and previous config saved to /var/cache/conftool/dbconfig/20220928-141001-root.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35085 and previous config saved to /var/cache/conftool/dbconfig/20220928-140956-root.json
  • 14:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8781
  • 14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35084 and previous config saved to /var/cache/conftool/dbconfig/20220928-140950-root.json
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 14:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
  • 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8359
  • 14:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudrabbit1003.wikimedia.org
  • 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8359
  • 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8075
  • 14:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-eqiad
  • 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8075
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7843
  • 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7843
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7795
  • 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7795
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7784
  • 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7784
  • 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
  • 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7713
  • 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7195
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7195
  • 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6762
  • 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6762
  • 14:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6614
  • 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6614
  • 14:02 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
  • 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6128
  • 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6128
  • 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6079
  • 14:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6079
  • 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
  • 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5650
  • 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5400
  • 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5400
  • 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4922
  • 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4922
  • 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
  • 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
  • 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
  • 13:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
  • 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
  • 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4637
  • 13:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4230
  • 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4230
  • 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4181
  • 13:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4181
  • 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35083 and previous config saved to /var/cache/conftool/dbconfig/20220928-135502-root.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35082 and previous config saved to /var/cache/conftool/dbconfig/20220928-135456-root.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35081 and previous config saved to /var/cache/conftool/dbconfig/20220928-135451-root.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35080 and previous config saved to /var/cache/conftool/dbconfig/20220928-135445-root.json
  • 13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
  • 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3300
  • 13:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3300
  • 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
  • 13:50 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3292
  • 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2906
  • 13:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
  • 13:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2906
  • 13:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
  • 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2647
  • 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2635
  • 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2635
  • 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2603
  • 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2603
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1273
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1273
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 812
  • 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 812
  • 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
  • 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35079 and previous config saved to /var/cache/conftool/dbconfig/20220928-133957-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35078 and previous config saved to /var/cache/conftool/dbconfig/20220928-133951-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35077 and previous config saved to /var/cache/conftool/dbconfig/20220928-133946-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35076 and previous config saved to /var/cache/conftool/dbconfig/20220928-133940-root.json
  • 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=1) rolling restart_daemons on A:thanos-fe-codfw
  • 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 577
  • 13:32 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 577
  • 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
  • 13:31 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35075 and previous config saved to /var/cache/conftool/dbconfig/20220928-132452-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35074 and previous config saved to /var/cache/conftool/dbconfig/20220928-132446-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35073 and previous config saved to /var/cache/conftool/dbconfig/20220928-132442-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35072 and previous config saved to /var/cache/conftool/dbconfig/20220928-132435-root.json
  • 13:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:15 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35071 and previous config saved to /var/cache/conftool/dbconfig/20220928-130947-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35070 and previous config saved to /var/cache/conftool/dbconfig/20220928-130941-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35069 and previous config saved to /var/cache/conftool/dbconfig/20220928-130937-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35068 and previous config saved to /var/cache/conftool/dbconfig/20220928-130930-root.json
  • 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35067 and previous config saved to /var/cache/conftool/dbconfig/20220928-125442-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35066 and previous config saved to /var/cache/conftool/dbconfig/20220928-125436-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35065 and previous config saved to /var/cache/conftool/dbconfig/20220928-125432-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35064 and previous config saved to /var/cache/conftool/dbconfig/20220928-125425-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35063 and previous config saved to /var/cache/conftool/dbconfig/20220928-123937-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35062 and previous config saved to /var/cache/conftool/dbconfig/20220928-123932-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35061 and previous config saved to /var/cache/conftool/dbconfig/20220928-123927-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35060 and previous config saved to /var/cache/conftool/dbconfig/20220928-123920-root.json
  • 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35058 and previous config saved to /var/cache/conftool/dbconfig/20220928-122432-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35057 and previous config saved to /var/cache/conftool/dbconfig/20220928-122427-root.json
  • 12:24 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C thirdparty/elastic710 copy buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35056 and previous config saved to /var/cache/conftool/dbconfig/20220928-122422-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35055 and previous config saved to /var/cache/conftool/dbconfig/20220928-122421-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35054 and previous config saved to /var/cache/conftool/dbconfig/20220928-122415-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35053 and previous config saved to /var/cache/conftool/dbconfig/20220928-122414-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35052 and previous config saved to /var/cache/conftool/dbconfig/20220928-122411-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35051 and previous config saved to /var/cache/conftool/dbconfig/20220928-122403-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35050 and previous config saved to /var/cache/conftool/dbconfig/20220928-122356-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35049 and previous config saved to /var/cache/conftool/dbconfig/20220928-122350-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35048 and previous config saved to /var/cache/conftool/dbconfig/20220928-122346-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P35047 and previous config saved to /var/cache/conftool/dbconfig/20220928-122321-root.json
  • 12:22 gehel: above reprepro copy failed, elastic710 component does not exist yet
  • 12:21 XioNoX: re-enable Init7 in knams
  • 12:21 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C elastic710 buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 db2146 db2122 es2022 for mariadb upgrade T318128', diff saved to https://phabricator.wikimedia.org/P35046 and previous config saved to /var/cache/conftool/dbconfig/20220928-121912-root.json
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 12:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35045 and previous config saved to /var/cache/conftool/dbconfig/20220928-120916-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35044 and previous config saved to /var/cache/conftool/dbconfig/20220928-120909-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35043 and previous config saved to /var/cache/conftool/dbconfig/20220928-120906-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35042 and previous config saved to /var/cache/conftool/dbconfig/20220928-120858-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35041 and previous config saved to /var/cache/conftool/dbconfig/20220928-120852-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35040 and previous config saved to /var/cache/conftool/dbconfig/20220928-120845-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35039 and previous config saved to /var/cache/conftool/dbconfig/20220928-120841-root.json
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 11:58 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35038 and previous config saved to /var/cache/conftool/dbconfig/20220928-115411-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35037 and previous config saved to /var/cache/conftool/dbconfig/20220928-115404-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35036 and previous config saved to /var/cache/conftool/dbconfig/20220928-115401-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35035 and previous config saved to /var/cache/conftool/dbconfig/20220928-115354-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35034 and previous config saved to /var/cache/conftool/dbconfig/20220928-115347-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35033 and previous config saved to /var/cache/conftool/dbconfig/20220928-115340-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35032 and previous config saved to /var/cache/conftool/dbconfig/20220928-115336-root.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35031 and previous config saved to /var/cache/conftool/dbconfig/20220928-113906-root.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35030 and previous config saved to /var/cache/conftool/dbconfig/20220928-113900-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35029 and previous config saved to /var/cache/conftool/dbconfig/20220928-113856-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35028 and previous config saved to /var/cache/conftool/dbconfig/20220928-113849-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35027 and previous config saved to /var/cache/conftool/dbconfig/20220928-113842-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35026 and previous config saved to /var/cache/conftool/dbconfig/20220928-113835-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35025 and previous config saved to /var/cache/conftool/dbconfig/20220928-113831-root.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35024 and previous config saved to /var/cache/conftool/dbconfig/20220928-112401-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35023 and previous config saved to /var/cache/conftool/dbconfig/20220928-112355-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35022 and previous config saved to /var/cache/conftool/dbconfig/20220928-112351-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35021 and previous config saved to /var/cache/conftool/dbconfig/20220928-112344-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35020 and previous config saved to /var/cache/conftool/dbconfig/20220928-112337-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35019 and previous config saved to /var/cache/conftool/dbconfig/20220928-112330-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35018 and previous config saved to /var/cache/conftool/dbconfig/20220928-112326-root.json
  • 11:18 moritzm: installing expat security updates
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35017 and previous config saved to /var/cache/conftool/dbconfig/20220928-110856-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35016 and previous config saved to /var/cache/conftool/dbconfig/20220928-110850-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35015 and previous config saved to /var/cache/conftool/dbconfig/20220928-110846-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35014 and previous config saved to /var/cache/conftool/dbconfig/20220928-110839-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35013 and previous config saved to /var/cache/conftool/dbconfig/20220928-110832-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35012 and previous config saved to /var/cache/conftool/dbconfig/20220928-110825-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35011 and previous config saved to /var/cache/conftool/dbconfig/20220928-110821-root.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T314041)', diff saved to https://phabricator.wikimedia.org/P35010 and previous config saved to /var/cache/conftool/dbconfig/20220928-105531-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T314041)', diff saved to https://phabricator.wikimedia.org/P35009 and previous config saved to /var/cache/conftool/dbconfig/20220928-105520-ladsgroup.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35008 and previous config saved to /var/cache/conftool/dbconfig/20220928-105351-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35007 and previous config saved to /var/cache/conftool/dbconfig/20220928-105345-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35006 and previous config saved to /var/cache/conftool/dbconfig/20220928-105340-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35005 and previous config saved to /var/cache/conftool/dbconfig/20220928-105332-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35004 and previous config saved to /var/cache/conftool/dbconfig/20220928-105327-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35003 and previous config saved to /var/cache/conftool/dbconfig/20220928-105320-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35002 and previous config saved to /var/cache/conftool/dbconfig/20220928-105315-root.json
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P35001 and previous config saved to /var/cache/conftool/dbconfig/20220928-104014-ladsgroup.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35000 and previous config saved to /var/cache/conftool/dbconfig/20220928-103847-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34999 and previous config saved to /var/cache/conftool/dbconfig/20220928-103840-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34998 and previous config saved to /var/cache/conftool/dbconfig/20220928-103835-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34997 and previous config saved to /var/cache/conftool/dbconfig/20220928-103827-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34996 and previous config saved to /var/cache/conftool/dbconfig/20220928-103822-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34995 and previous config saved to /var/cache/conftool/dbconfig/20220928-103815-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34994 and previous config saved to /var/cache/conftool/dbconfig/20220928-103810-root.json
  • 10:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 db1137 db1168 db1143 db1132 db1127 es1022 for mariadb upgrade T318128', diff saved to https://phabricator.wikimedia.org/P34993 and previous config saved to /var/cache/conftool/dbconfig/20220928-102759-root.json
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P34992 and previous config saved to /var/cache/conftool/dbconfig/20220928-102508-ladsgroup.json
  • 10:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T314041)', diff saved to https://phabricator.wikimedia.org/P34990 and previous config saved to /var/cache/conftool/dbconfig/20220928-101001-ladsgroup.json
  • 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59689
  • 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59689
  • 08:49 jbond: disable puppet on cache serveres to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/832268
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T314041)', diff saved to https://phabricator.wikimedia.org/P34989 and previous config saved to /var/cache/conftool/dbconfig/20220928-084557-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T314041)', diff saved to https://phabricator.wikimedia.org/P34988 and previous config saved to /var/cache/conftool/dbconfig/20220928-084535-ladsgroup.json
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34987 and previous config saved to /var/cache/conftool/dbconfig/20220928-083029-ladsgroup.json
  • 08:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34985 and previous config saved to /var/cache/conftool/dbconfig/20220928-081522-ladsgroup.json
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T314041)', diff saved to https://phabricator.wikimedia.org/P34984 and previous config saved to /var/cache/conftool/dbconfig/20220928-080015-ladsgroup.json
  • 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:30 XioNoX: disable BGP to init7 in knams
  • 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:08 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557) (duration: 05m 17s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:03 kartik@deploy1002: kartik and kartik: Backport for testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:03 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)
  • 06:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T314041)', diff saved to https://phabricator.wikimedia.org/P34981 and previous config saved to /var/cache/conftool/dbconfig/20220928-043052-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T314041)', diff saved to https://phabricator.wikimedia.org/P34980 and previous config saved to /var/cache/conftool/dbconfig/20220928-043030-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34979 and previous config saved to /var/cache/conftool/dbconfig/20220928-041524-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34978 and previous config saved to /var/cache/conftool/dbconfig/20220928-040017-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T314041)', diff saved to https://phabricator.wikimedia.org/P34977 and previous config saved to /var/cache/conftool/dbconfig/20220928-034511-ladsgroup.json
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T314041)', diff saved to https://phabricator.wikimedia.org/P34976 and previous config saved to /var/cache/conftool/dbconfig/20220928-020746-ladsgroup.json
  • 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 02:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34975 and previous config saved to /var/cache/conftool/dbconfig/20220928-020724-ladsgroup.json
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34974 and previous config saved to /var/cache/conftool/dbconfig/20220928-015218-ladsgroup.json
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34973 and previous config saved to /var/cache/conftool/dbconfig/20220928-013711-ladsgroup.json
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
  • 01:18 ejegg: updated fundraising python tools from b65109af to dd494413
  • 00:34 eileen: civicrm upgraded from 118c1d0b to 916a8b08
  • 00:11 eileen: civicrm upgraded from e198fb4c to 118c1d0b

2022-09-27

  • 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
  • 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
  • 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T314041)', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:12 TheresNoTime: closing UTC late backport window
  • 21:10 samtar@deploy1002: Finished scap: Backport for Remove figures from text extracts (T318727) (duration: 04m 53s)
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 samtar@deploy1002: samtar and ssastry: Backport for Remove figures from text extracts (T318727) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:06 samtar@deploy1002: Started scap: Backport for Remove figures from text extracts (T318727)
  • 21:06 samtar@deploy1002: Finished scap: Backport for Remove figures from text extracts (T318727) (duration: 06m 58s)
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
  • 20:59 TheresNoTime: extending UTC late backport window
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:58 samtar@deploy1002: samtar and ssastry: Backport for Remove figures from text extracts (T318727) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:58 samtar@deploy1002: Started scap: Backport for Remove figures from text extracts (T318727)
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:53 samtar@deploy1002: Finished scap: Backport for romdwikimedia: Enable subpages in NS0 (T318491) (duration: 05m 29s)
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:48 samtar@deploy1002: samtar and stang: Backport for romdwikimedia: Enable subpages in NS0 (T318491) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:48 samtar@deploy1002: Started scap: Backport for romdwikimedia: Enable subpages in NS0 (T318491)
  • 20:46 samtar@deploy1002: Finished scap: Backport for elastic: rebalance enwiki_content shard counts (T318270) (duration: 05m 14s)
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:41 samtar@deploy1002: samtar and ryankemper: Backport for elastic: rebalance enwiki_content shard counts (T318270) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:41 samtar@deploy1002: Started scap: Backport for elastic: rebalance enwiki_content shard counts (T318270)
  • 20:38 samtar@deploy1002: Finished scap: Backport for Add wmgMFDefaultEditor back in for future use (duration: 06m 02s)
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:33 samtar@deploy1002: samtar and kemayo: Backport for Add wmgMFDefaultEditor back in for future use synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:32 samtar@deploy1002: Started scap: Backport for Add wmgMFDefaultEditor back in for future use
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 samtar@deploy1002: Started scap: Backport for Disable MobileFrontend default editor a/b test (T302356)
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 samtar@deploy1002: Started scap: Backport for Disable MobileFrontend default editor a/b test (T302356)
  • 20:20 samtar@deploy1002: Finished scap: Backport for Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626) (duration: 04m 58s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:15 samtar@deploy1002: samtar and kemayo: Backport for Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
  • 20:15 samtar@deploy1002: Started scap: Backport for Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 samtar@deploy1002: Finished scap: Backport for MobileWebUIActions sample rate to 1 on testwiki (T302108) (duration: 05m 46s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:04 samtar@deploy1002: samtar and kemayo: Backport for MobileWebUIActions sample rate to 1 on testwiki (T302108) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:04 samtar@deploy1002: Started scap: Backport for MobileWebUIActions sample rate to 1 on testwiki (T302108)
  • 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
  • 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
  • 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
  • 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3 refs T314192
  • 18:02 brennen: 1.40.0-wmf.3 (T314192) no current blockers, promoting to group0
  • 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
  • 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
  • 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
  • 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
  • 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
  • 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
  • 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
  • 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
  • 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T314041)', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
  • 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T314041)', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
  • 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
  • 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # T315552, 710183 rows done
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
  • 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
  • 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
  • 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # T315552
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:06 taavi@deploy1002: Finished scap: Backport for Track use of Searchbox footer on Wikidata (T306933), Track use of Searchbox footer on Wikidata (T306933) (duration: 06m 59s)
  • 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:59 taavi@deploy1002: taavi and migr: Backport for Track use of Searchbox footer on Wikidata (T306933), Track use of Searchbox footer on Wikidata (T306933) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:59 taavi@deploy1002: Started scap: Backport for Track use of Searchbox footer on Wikidata (T306933), Track use of Searchbox footer on Wikidata (T306933)
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T314041)', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T314041)', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
  • 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:57 jbond: upload new wmf-laptop_0.5.4 package
  • 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
  • 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
  • 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
  • 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update T311686
  • 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T314041)', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
  • 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T314041)', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
  • 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
  • 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
  • 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
  • 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - T310745
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
  • 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - T310745
  • 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
  • 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
  • 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T314041)', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T314041)', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
  • 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
  • 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 T318128
  • 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3 refs T314192 (duration: 36m 01s)
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3 refs T314192
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T314041)', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T314041)', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
  • 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
  • 01:17 eileen: civicrm upgraded from dcef393d to e198fb4c
  • 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T314041)', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
  • 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
  • 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
  • 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
  • 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
  • 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
  • 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
  • 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
  • 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
  • 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json

2022-09-26

  • 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
  • 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
  • 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
  • 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
  • 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
  • 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
  • 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
  • 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
  • 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
  • 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
  • 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
  • 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
  • 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:31 TheresNoTime: closing UTC late backport window
  • 20:18 samtar@deploy1002: Finished scap: Backport for Fix VisualEditor on wikis where RESTBase was never set up (T318325) (duration: 06m 52s)
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 20:11 samtar@deploy1002: samtar and matmarex: Backport for Fix VisualEditor on wikis where RESTBase was never set up (T318325) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:11 samtar@deploy1002: Started scap: Backport for Fix VisualEditor on wikis where RESTBase was never set up (T318325)
  • 20:10 samtar@deploy1002: Finished scap: Backport for wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070) (duration: 06m 13s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
  • 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
  • 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
  • 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
  • 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
  • 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
  • 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
  • 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
  • 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
  • 20:04 samtar@deploy1002: samtar and matmarex: Backport for wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
  • 20:03 samtar@deploy1002: Started scap: Backport for wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)
  • 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
  • 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T314041)', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
  • 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
  • 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
  • 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
  • 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
  • 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
  • 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
  • 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
  • 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
  • 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
  • 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
  • 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
  • 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
  • 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
  • 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
  • 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
  • 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
  • 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
  • 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
  • 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
  • 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
  • 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
  • 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
  • 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
  • 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
  • 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
  • 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
  • 13:56 moritzm: installing mako security updates
  • 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
  • 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
  • 13:45 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: Set default sortkey for prefixed pages (T315551) (2/2) (duration: 03m 39s)
  • 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: Set default sortkey for prefixed pages (T315551) (1/2) (duration: 03m 51s)
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgCiteResponsiveReferences on etwiki (T318530) (duration: 03m 53s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
  • 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
  • 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:51 moritzm: installing bind9 security updates on Bullseye
  • 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:51 ladsgroup@deploy1002: Finished scap: Backport for Bump portals to HEAD (T273179) (duration: 06m 05s)
  • 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Bump portals to HEAD (T273179) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 12:44 ladsgroup@deploy1002: Started scap: Backport for Bump portals to HEAD (T273179)
  • 12:25 moritzm: installing unzip security updates
  • 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
  • 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 033ab75: arwiki: Properly grant enrollasmentor to editor (T310905) (duration: 03m 46s)
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:56 btullis: adding 80GB of virtual disk to matomo1002
  • 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0a54867: arwiki: Grant enrollasmentor to editor (T310905) (duration: 03m 40s)
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:07 godog: upgrade grafana to 8.5.13
  • 08:04 godog: add 20G to prometheus/analytics in codfw
  • 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:31 oblivian@deploy1002: Finished scap: Backport for Move 100% of cookie-accepting clients to php 7.4 (T271736) (duration: 05m 31s)
  • 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for Move 100% of cookie-accepting clients to php 7.4 (T271736) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:26 oblivian@deploy1002: Started scap: Backport for Move 100% of cookie-accepting clients to php 7.4 (T271736)
  • 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: 620bb80: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and shi to InterwikiSortOrders (duration: 03m 40s)
  • 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 81f6662: Add wordmark and tagline for mnwiki (T318478) (duration: 03m 46s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: 81f6662: Add wordmark and tagline for mnwiki (T318478; 1/2) (duration: 03m 40s)
  • 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
  • 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d2d2c08: eswiki: Enable structured mentor list (T310905) (duration: 04m 30s)
  • 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-25

  • 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
  • 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
  • 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
  • 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
  • 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
  • 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
  • 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
  • 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
  • 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
  • 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
  • 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
  • 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye

2022-09-23

  • 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
  • 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
  • 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
  • 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
  • 13:39 hashar@deploy1002: Finished scap: Backport for Stop using Elastica::Type and set the target indices (T318356) (duration: 07m 10s)
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:32 hashar@deploy1002: hashar and hashar: Backport for Stop using Elastica::Type and set the target indices (T318356) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:31 hashar@deploy1002: Started scap: Backport for Stop using Elastica::Type and set the target indices (T318356)
  • 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
  • 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
  • 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
  • 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
  • 09:26 jynus: stopping db1117:s3 for maintenance T315713
  • 08:51 Emperor: rebalance ms-eqiad swift rings T294550
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
  • 06:10 marostegui: Shutdown db1189 T317662
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance

2022-09-22

  • 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
  • 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:23 dancy@deploy1002: backport aborted: (duration: 00m 05s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 brennen: end of utc late backport & config window
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:54 brennen@deploy1002: Finished scap: Backport for Restrict figure to the size of the media (T305357 T318300), Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300) (duration: 06m 33s)
  • 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
  • 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
  • 20:48 brennen@deploy1002: brennen and arlolra: Backport for Restrict figure to the size of the media (T305357 T318300), Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:47 brennen@deploy1002: Started scap: Backport for Restrict figure to the size of the media (T305357 T318300), Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)
  • 20:36 brennen@deploy1002: backport aborted: (duration: 02m 16s)
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:25 brennen@deploy1002: Finished scap: Backport for Drops JS-side creation of "Source" link (T318266) (duration: 06m 09s)
  • 20:19 brennen@deploy1002: brennen and tpt: Backport for Drops JS-side creation of "Source" link (T318266) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:19 brennen@deploy1002: Started scap: Backport for Drops JS-side creation of "Source" link (T318266)
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:38 jhuneidi@deploy1002: Started scap: testing
  • 18:38 dancy@deploy1002: Started scap: testing
  • 18:37 jhuneidi@deploy1002: Started scap: testing
  • 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
  • 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
  • 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2 refs T314191
  • 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
  • 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
  • 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
  • 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:39 dancy@deploy1002: Sync cancelled.
  • 16:39 dancy@deploy1002: dancy and dancy: Backport for InitialiseSettings-labs.php: Added test text (to be reverted) (T317242) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 16:38 dancy@deploy1002: Started scap: Backport for InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dcf3710: Remove Research Incentive survey on idwiki (T316466) (duration: 03m 50s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ff867a4: Add wgMetaNamespace for knwiktionary and knwikiquote (T318318) (duration: 03m 57s)
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section Translation in Bhojpuri Wikipedia (T313296) (duration: 04m 03s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-21

  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:46 tgr_: UTC late deploys done
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: Block metrics: Bump schema to un-require some fields (T317343) (duration: 03m 42s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: Block metrics: Bump schema to un-require some fields (T317343) (duration: 03m 55s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:25 samtar@deploy1002: Finished scap: Backport for cirrus: Limit shard count to 1 in deployment-prep (T316711) (duration: 04m 19s)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for cirrus: Limit shard count to 1 in deployment-prep (T316711) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:20 samtar@deploy1002: Started scap: Backport for cirrus: Limit shard count to 1 in deployment-prep (T316711)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:17 samtar@deploy1002: Finished scap: Backport for Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625) (duration: 05m 31s)
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:12 samtar@deploy1002: samtar and kemayo: Backport for Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:11 samtar@deploy1002: Started scap: Backport for Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 samtar@deploy1002: Finished scap: Backport for Remove deployment-db08 (T318126) (duration: 05m 16s)
  • 20:04 samtar@deploy1002: samtar and zabe: Backport for Remove deployment-db08 (T318126) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:04 samtar@deploy1002: Started scap: Backport for Remove deployment-db08 (T318126)
  • 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
  • 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b8b2ebd: Growth: Switch pilot wikis to structured mentor list (T310905) (duration: 03m 59s)
  • 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
  • 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
  • 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
  • 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
  • 14:56 Emperor: set thanos ring replicas to 3.75 T311690
  • 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Pool deployment-db09, depool deployment-db08 (T318126) (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
  • 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Pool deployment-db09, depool deployment-db08 (T318126) (Beta-only, exchange one replica for another) (duration: 03m 48s)
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:59 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Add back deployment-db08 (T318126) (Beta-only, restore old replica) (duration: 03m 48s)
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Replace deployment-db08 with deployment-db09 (T318126) (Beta-only, replace one replica with another) (duration: 03m 56s)
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add editcontentmodel right for metawiki translation administrators (T311587) (duration: 03m 50s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318) (turning on new-style media output) (duration: 04m 03s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2 refs T314191 (duration: 04m 02s)
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2 refs T314191
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul

2022-09-20

  • 20:19 cjming: end of UTC late backport window
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 cjming@deploy1002: Finished scap: Backport for Enable Nearby everywhere (T246493) (duration: 09m 02s)
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
  • 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
  • 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for Enable Nearby everywhere (T246493) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
  • 20:04 cjming@deploy1002: Started scap: Backport for Enable Nearby everywhere (T246493)
  • 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 20:01 eileen: civicrm upgraded from e82d9cd0 to dcef393d
  • 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
  • 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 18:50 jynus: restart db2100:s7 to apply new config
  • 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:45 cstone: payments-wiki upgraded from de4b2bb9 to 0456850e
  • 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2 refs T314191
  • 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:42 dancy@deploy1002: Sync cancelled.
  • 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:41 dancy@deploy1002: Started scap: testing, disregard
  • 16:09 awight@deploy1002: backport aborted: (duration: 00m 33s)
  • 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable Tech Wishes survey on dewiki (T316676) (take 2) (duration: 03m 42s)
  • 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable Tech Wishes survey on dewiki (T316676) (duration: 03m 53s)
  • 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
  • 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
  • 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: 1ac09d4: Update HomepageModule schema version (T310320) (duration: 03m 39s)
  • 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: 1a27e05: Update HomepageModule schema version (T310320) (duration: 03m 52s)
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
  • 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0b55db6: Enable Tech Wishes survey on dewiki (T316676) (duration: 04m 12s)
  • 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
  • 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
  • 08:35 hashar: Restarted CI Jenkins for plugin update
  • 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289) (duration: 03m 46s)
  • 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:10 kart_: Updated cxserver to 2022-09-15-113346-production (T317289, T315209)
  • 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2 refs T314191 (duration: 36m 08s)
  • 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2 refs T314191
  • 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-19

  • 22:59 ebernhardson: T317200 start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
  • 21:21 maryum: Deployed security patch for T302479
  • 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
  • 21:15 sbassett: Deployed security patch for T312820
  • 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:59 cjming: end of UTC late backport window
  • 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: Add token_count subfield to outgoing_link (T317546) (duration: 03m 51s)
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Wikifunctions: Drop two config items moved to docker (duration: 03m 38s)
  • 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: ExtensionDistributor: Add REL1_39 (T313925) (duration: 03m 38s)
  • 20:12 cjming@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on cswiki (T314318) (duration: 06m 31s)
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 cjming@deploy1002: cjming and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on cswiki (T314318) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:06 cjming@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)
  • 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
  • 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
  • 17:36 dancy@deploy1002: Sync cancelled.
  • 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 17:36 dancy@deploy1002: Started scap: testing, disregard
  • 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage{.png,-1.5x.png,-2x.png} (T317718)
  • 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: 6c7151d: Regenerate ukwikivoyage logo (T317718) (duration: 03m 46s)
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cbf161d: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro (T314518) (duration: 04m 01s)
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4a6c1dd: Remove unnecessary wgNamespaceAliases from bnwiki (T318003) (duration: 04m 16s)
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-17

  • 12:17 Emperor: set thanos ring replicas to 3.80 T311690
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T314041)', diff saved to https://phabricator.wikimedia.org/P34879 and previous config saved to /var/cache/conftool/dbconfig/20220917-103903-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34878 and previous config saved to /var/cache/conftool/dbconfig/20220917-102356-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34877 and previous config saved to /var/cache/conftool/dbconfig/20220917-100850-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T314041)', diff saved to https://phabricator.wikimedia.org/P34876 and previous config saved to /var/cache/conftool/dbconfig/20220917-095344-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34875 and previous config saved to /var/cache/conftool/dbconfig/20220917-094856-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34874 and previous config saved to /var/cache/conftool/dbconfig/20220917-093349-ladsgroup.json
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34873 and previous config saved to /var/cache/conftool/dbconfig/20220917-091843-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34872 and previous config saved to /var/cache/conftool/dbconfig/20220917-090336-ladsgroup.json
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T314041)', diff saved to https://phabricator.wikimedia.org/P34871 and previous config saved to /var/cache/conftool/dbconfig/20220917-074806-ladsgroup.json
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34870 and previous config saved to /var/cache/conftool/dbconfig/20220917-073300-ladsgroup.json
  • 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34869 and previous config saved to /var/cache/conftool/dbconfig/20220917-071753-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T314041)', diff saved to https://phabricator.wikimedia.org/P34868 and previous config saved to /var/cache/conftool/dbconfig/20220917-070247-ladsgroup.json
  • 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T314041)', diff saved to https://phabricator.wikimedia.org/P34867 and previous config saved to /var/cache/conftool/dbconfig/20220917-051719-ladsgroup.json
  • 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T314041)', diff saved to https://phabricator.wikimedia.org/P34866 and previous config saved to /var/cache/conftool/dbconfig/20220917-051527-ladsgroup.json
  • 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34865 and previous config saved to /var/cache/conftool/dbconfig/20220917-051203-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance

2022-09-16

  • 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T314041)', diff saved to https://phabricator.wikimedia.org/P34864 and previous config saved to /var/cache/conftool/dbconfig/20220916-212905-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34863 and previous config saved to /var/cache/conftool/dbconfig/20220916-211358-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34862 and previous config saved to /var/cache/conftool/dbconfig/20220916-205852-ladsgroup.json
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T314041)', diff saved to https://phabricator.wikimedia.org/P34861 and previous config saved to /var/cache/conftool/dbconfig/20220916-204345-ladsgroup.json
  • 19:16 mutante: cp1081 /usr/local/sbin/update-ocsp-all
  • 17:01 mutante: gitlab-runner*: deployed gerrit:832584 and systemctl restart buildkitd on 6 hosts for T317904
  • 16:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
  • 16:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2184
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2183
  • 16:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2183
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T314041)', diff saved to https://phabricator.wikimedia.org/P34860 and previous config saved to /var/cache/conftool/dbconfig/20220916-161409-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T314041)', diff saved to https://phabricator.wikimedia.org/P34859 and previous config saved to /var/cache/conftool/dbconfig/20220916-161346-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34858 and previous config saved to /var/cache/conftool/dbconfig/20220916-155840-ladsgroup.json
  • 15:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:52 dancy@deploy1002: Installation of scap version "4.20.0" completed for 561 hosts
  • 15:51 dancy@deploy1002: Installing scap version "4.20.0" for 561 hosts
  • 15:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:44 dancy@deploy1002: Finished scap: testing (duration: 04m 53s)
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34857 and previous config saved to /var/cache/conftool/dbconfig/20220916-154333-ladsgroup.json
  • 15:39 dancy@deploy1002: Started scap: testing
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T314041)', diff saved to https://phabricator.wikimedia.org/P34856 and previous config saved to /var/cache/conftool/dbconfig/20220916-152827-ladsgroup.json
  • 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:01 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 15:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 15:01 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:42 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:17 godog: add 100G to prometheus/eqiad instance k8s-mlserve
  • 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34855 and previous config saved to /var/cache/conftool/dbconfig/20220916-131902-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34854 and previous config saved to /var/cache/conftool/dbconfig/20220916-130357-root.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34853 and previous config saved to /var/cache/conftool/dbconfig/20220916-125841-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34852 and previous config saved to /var/cache/conftool/dbconfig/20220916-124850-root.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34851 and previous config saved to /var/cache/conftool/dbconfig/20220916-124336-root.json
  • 12:43 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34850 and previous config saved to /var/cache/conftool/dbconfig/20220916-123346-root.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34849 and previous config saved to /var/cache/conftool/dbconfig/20220916-122831-root.json
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34848 and previous config saved to /var/cache/conftool/dbconfig/20220916-121841-root.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34847 and previous config saved to /var/cache/conftool/dbconfig/20220916-121326-root.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34846 and previous config saved to /var/cache/conftool/dbconfig/20220916-120336-root.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34845 and previous config saved to /var/cache/conftool/dbconfig/20220916-115821-root.json
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34844 and previous config saved to /var/cache/conftool/dbconfig/20220916-114935-root.json
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34843 and previous config saved to /var/cache/conftool/dbconfig/20220916-114831-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34842 and previous config saved to /var/cache/conftool/dbconfig/20220916-114316-root.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P34841 and previous config saved to /var/cache/conftool/dbconfig/20220916-113543-root.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34840 and previous config saved to /var/cache/conftool/dbconfig/20220916-113431-root.json
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34839 and previous config saved to /var/cache/conftool/dbconfig/20220916-113325-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P34838 and previous config saved to /var/cache/conftool/dbconfig/20220916-112750-root.json
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34837 and previous config saved to /var/cache/conftool/dbconfig/20220916-111925-root.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34836 and previous config saved to /var/cache/conftool/dbconfig/20220916-110420-root.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T314041)', diff saved to https://phabricator.wikimedia.org/P34835 and previous config saved to /var/cache/conftool/dbconfig/20220916-105819-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34834 and previous config saved to /var/cache/conftool/dbconfig/20220916-105809-ladsgroup.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34832 and previous config saved to /var/cache/conftool/dbconfig/20220916-104916-root.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34831 and previous config saved to /var/cache/conftool/dbconfig/20220916-104303-ladsgroup.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34830 and previous config saved to /var/cache/conftool/dbconfig/20220916-103411-root.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34829 and previous config saved to /var/cache/conftool/dbconfig/20220916-102756-ladsgroup.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34828 and previous config saved to /var/cache/conftool/dbconfig/20220916-101905-root.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34827 and previous config saved to /var/cache/conftool/dbconfig/20220916-101250-ladsgroup.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34826 and previous config saved to /var/cache/conftool/dbconfig/20220916-100400-root.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34825 and previous config saved to /var/cache/conftool/dbconfig/20220916-093635-root.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34824 and previous config saved to /var/cache/conftool/dbconfig/20220916-093121-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34823 and previous config saved to /var/cache/conftool/dbconfig/20220916-092130-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34822 and previous config saved to /var/cache/conftool/dbconfig/20220916-091616-root.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34821 and previous config saved to /var/cache/conftool/dbconfig/20220916-091234-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34820 and previous config saved to /var/cache/conftool/dbconfig/20220916-090625-root.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34819 and previous config saved to /var/cache/conftool/dbconfig/20220916-090111-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34818 and previous config saved to /var/cache/conftool/dbconfig/20220916-085120-root.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34817 and previous config saved to /var/cache/conftool/dbconfig/20220916-084607-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34816 and previous config saved to /var/cache/conftool/dbconfig/20220916-083615-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34815 and previous config saved to /var/cache/conftool/dbconfig/20220916-083102-root.json
  • 08:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34814 and previous config saved to /var/cache/conftool/dbconfig/20220916-082110-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34813 and previous config saved to /var/cache/conftool/dbconfig/20220916-081557-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34812 and previous config saved to /var/cache/conftool/dbconfig/20220916-080605-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34811 and previous config saved to /var/cache/conftool/dbconfig/20220916-080052-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34810 and previous config saved to /var/cache/conftool/dbconfig/20220916-075100-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34809 and previous config saved to /var/cache/conftool/dbconfig/20220916-074548-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34808 and previous config saved to /var/cache/conftool/dbconfig/20220916-074251-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P34807 and previous config saved to /var/cache/conftool/dbconfig/20220916-072958-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34806 and previous config saved to /var/cache/conftool/dbconfig/20220916-072746-root.json
  • 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34805 and previous config saved to /var/cache/conftool/dbconfig/20220916-071241-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34804 and previous config saved to /var/cache/conftool/dbconfig/20220916-065737-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34803 and previous config saved to /var/cache/conftool/dbconfig/20220916-064232-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34802 and previous config saved to /var/cache/conftool/dbconfig/20220916-062727-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34801 and previous config saved to /var/cache/conftool/dbconfig/20220916-061222-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34800 and previous config saved to /var/cache/conftool/dbconfig/20220916-055717-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34799 and previous config saved to /var/cache/conftool/dbconfig/20220916-055542-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34798 and previous config saved to /var/cache/conftool/dbconfig/20220916-055424-root.json
  • 05:51 marostegui: Install 10.6 on db1168 T301879
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34797 and previous config saved to /var/cache/conftool/dbconfig/20220916-055031-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P34795 and previous config saved to /var/cache/conftool/dbconfig/20220916-054438-root.json
  • 01:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 01:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 01:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
  • 01:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 00:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 17s)
  • 00:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)

2022-09-15

  • 23:51 mutante: gerrit1001 - disabled puppet - gerrit:832411
  • 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: T316236
  • 22:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: T316236
  • 21:30 ebernhardson: depool wcqs2001 for T316236
  • 20:25 thcipriani@deploy1002: Finished scap: Backport for Increase coverage of Research Incentive Survey on idwiki (T316466) (duration: 07m 06s)
  • 20:18 thcipriani@deploy1002: thcipriani and dani: Backport for Increase coverage of Research Incentive Survey on idwiki (T316466) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:18 thcipriani@deploy1002: Started scap: Backport for Increase coverage of Research Incentive Survey on idwiki (T316466)
  • 20:15 thcipriani@deploy1002: Finished scap: Backport for Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676) (duration: 07m 39s)
  • 20:08 thcipriani@deploy1002: thcipriani and dcausse: Backport for Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:07 thcipriani@deploy1002: Started scap: Backport for Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)
  • 19:26 ebernhardson: pool'd wdqs2001, some blockers before reload can start T316236
  • 18:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.1 refs T314190
  • 18:39 dancy@deploy1002: Finished scap: Backport for Use more permissive match for TOC_PLACEHOLDER in parser output (T317857) (duration: 09m 53s)
  • 18:38 cwhite: restart thanos-compact (thanos-fe2001) and swift_ring_manager (thanos-fe1001)
  • 18:29 dancy@deploy1002: dancy and cscott: Backport for Use more permissive match for TOC_PLACEHOLDER in parser output (T317857) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 18:29 dancy@deploy1002: Started scap: Backport for Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)
  • 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2003.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2003.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2001.codfw.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2001.codfw.wmnet on all recursors
  • 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1003.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1003.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1002.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1002.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1001.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1001.eqiad.wmnet on all recursors
  • 18:15 ebernhardson: depool wcqs2001 for T316236
  • 18:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:13 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 18:07 godog: restart envoyproxy on thanos-fe*
  • 18:06 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
  • 18:06 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
  • 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging BryanDavis out of all services on: 2047 hosts
  • 16:16 andrew@cumin1001: START - Cookbook sre.idm.logout Logging BryanDavis out of all services on: 2047 hosts
  • 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 15:22 hnowlan: starting cassandra on sessionstore1001-a
  • 15:18 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T314041)', diff saved to https://phabricator.wikimedia.org/P34792 and previous config saved to /var/cache/conftool/dbconfig/20220915-151131-ladsgroup.json
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34791 and previous config saved to /var/cache/conftool/dbconfig/20220915-145625-ladsgroup.json
  • 14:41 moritzm: installing libtirpc security updates
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34790 and previous config saved to /var/cache/conftool/dbconfig/20220915-144118-ladsgroup.json
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T314041)', diff saved to https://phabricator.wikimedia.org/P34789 and previous config saved to /var/cache/conftool/dbconfig/20220915-142612-ladsgroup.json
  • 14:01 sukhe: retarting bird.service on A:dns-auth for zlib update
  • 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6b9784a: Enable the Vue version of the mentee overview in pilot wikis (T300532) (duration: 03m 45s)
  • 13:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d] (duration: 00m 09s)
  • 13:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d]
  • 13:57 sukhe: retarting haproxy.service on A:dns-auth for zlib update
  • 13:57 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d] (duration: 00m 10s)
  • 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d]
  • 13:51 jayme: updated rsyslog to 8.2208.0-1~bpo11+1 on all kubernetes masters and nodes - T289766
  • 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383] (duration: 06m 01s)
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:41 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383]
  • 13:38 sukhe: restarting bird.service on A:dns-rec for zlib update
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:33 sukhe: restarting pdns-recursor on A:dns-rec for zlib update
  • 13:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/GrowthExperiments/: f592e85: Mentee overview: avoid requiring the non-vue mentee overview script when loading the Vue one (T300532) (duration: 04m 05s)
  • 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
  • 12:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
  • 12:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS buster
  • 12:17 jayme: fleet wide update of prometheus-rsyslog-exporter to 0.0.0+git20201008-4 - T289766
  • 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
  • 12:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34787 and previous config saved to /var/cache/conftool/dbconfig/20220915-120013-root.json
  • 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:50 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34786 and previous config saved to /var/cache/conftool/dbconfig/20220915-114508-root.json
  • 11:44 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:43 moritzm: restart exim on lists1001 to pick up zlib security updates
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34785 and previous config saved to /var/cache/conftool/dbconfig/20220915-113003-root.json
  • 11:22 jayme: importing prometheus-rsyslog-exporter 0.0.0+git20201008-4 to stretch-wikimedia, buster-wikimedia, bullseye-wikimedia - T289766
  • 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 11:15 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34784 and previous config saved to /var/cache/conftool/dbconfig/20220915-111458-root.json
  • 11:12 hnowlan: sessionstore1001: c-foreach-nt drain
  • 11:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
  • 11:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'pool db2129 into s6 API', diff saved to https://phabricator.wikimedia.org/P34783 and previous config saved to /var/cache/conftool/dbconfig/20220915-110453-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34782 and previous config saved to /var/cache/conftool/dbconfig/20220915-105953-root.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34781 and previous config saved to /var/cache/conftool/dbconfig/20220915-104448-root.json
  • 10:36 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34780 and previous config saved to /var/cache/conftool/dbconfig/20220915-102943-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34779 and previous config saved to /var/cache/conftool/dbconfig/20220915-101438-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34778 and previous config saved to /var/cache/conftool/dbconfig/20220915-101425-root.json
  • 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2131.codfw.wmnet with reason: reboot
  • 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db2131.codfw.wmnet with reason: reboot
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P34777 and previous config saved to /var/cache/conftool/dbconfig/20220915-100212-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34775 and previous config saved to /var/cache/conftool/dbconfig/20220915-095920-root.json
  • 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34774 and previous config saved to /var/cache/conftool/dbconfig/20220915-094415-root.json
  • 09:38 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383] (duration: 14m 21s)
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34773 and previous config saved to /var/cache/conftool/dbconfig/20220915-092910-root.json
  • 09:23 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383]
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34772 and previous config saved to /var/cache/conftool/dbconfig/20220915-091405-root.json
  • 09:13 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383] (duration: 00m 08s)
  • 09:13 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383]
  • 09:12 aqu@deploy1002: Finished deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383] (duration: 27m 31s)
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34770 and previous config saved to /var/cache/conftool/dbconfig/20220915-085900-root.json
  • 08:49 apergos: UTC backport training window closed at lsat
  • 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 08:45 aqu@deploy1002: Started deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383]
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34768 and previous config saved to /var/cache/conftool/dbconfig/20220915-084355-root.json
  • 08:43 aqu: about to deploy analytics/refinery
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34767 and previous config saved to /var/cache/conftool/dbconfig/20220915-084046-root.json
  • 08:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34766 and previous config saved to /var/cache/conftool/dbconfig/20220915-082851-root.json
  • 08:26 tsepothoabala@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable action blocks on ptwiki (T317157) (duration: 04m 07s)
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34765 and previous config saved to /var/cache/conftool/dbconfig/20220915-082541-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34764 and previous config saved to /var/cache/conftool/dbconfig/20220915-082112-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 T317850', diff saved to https://phabricator.wikimedia.org/P34763 and previous config saved to /var/cache/conftool/dbconfig/20220915-081627-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 codfw master T317850', diff saved to https://phabricator.wikimedia.org/P34762 and previous config saved to /var/cache/conftool/dbconfig/20220915-081517-marostegui.json
  • 08:14 marostegui: Starting s6 codfw failover from db2129 to db2114 - T317850
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34761 and previous config saved to /var/cache/conftool/dbconfig/20220915-081036-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34760 and previous config saved to /var/cache/conftool/dbconfig/20220915-080607-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 from API T317850', diff saved to https://phabricator.wikimedia.org/P34759 and previous config saved to /var/cache/conftool/dbconfig/20220915-080157-root.json
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary codfw s6 T317850
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 T317850', diff saved to https://phabricator.wikimedia.org/P34758 and previous config saved to /var/cache/conftool/dbconfig/20220915-080122-root.json
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary codfw s6 T317850
  • 07:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34757 and previous config saved to /var/cache/conftool/dbconfig/20220915-075531-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34756 and previous config saved to /var/cache/conftool/dbconfig/20220915-075102-root.json
  • 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34755 and previous config saved to /var/cache/conftool/dbconfig/20220915-074026-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34754 and previous config saved to /var/cache/conftool/dbconfig/20220915-073557-root.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34753 and previous config saved to /var/cache/conftool/dbconfig/20220915-072520-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34752 and previous config saved to /var/cache/conftool/dbconfig/20220915-072053-root.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: reboot
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: reboot
  • 07:14 moritzm: installing zlib security updates
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:11 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34751 and previous config saved to /var/cache/conftool/dbconfig/20220915-071015-root.json
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 07:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34750 and previous config saved to /var/cache/conftool/dbconfig/20220915-070548-root.json
  • 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34749 and previous config saved to /var/cache/conftool/dbconfig/20220915-065510-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34748 and previous config saved to /var/cache/conftool/dbconfig/20220915-065043-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db2096 T317842', diff saved to https://phabricator.wikimedia.org/P34747 and previous config saved to /var/cache/conftool/dbconfig/20220915-064750-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 T317842', diff saved to https://phabricator.wikimedia.org/P34746 and previous config saved to /var/cache/conftool/dbconfig/20220915-064635-marostegui.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2096 to x1 primary and set section read-write T317842', diff saved to https://phabricator.wikimedia.org/P34745 and previous config saved to /var/cache/conftool/dbconfig/20220915-064525-root.json
  • 06:44 marostegui: Starting x1 codfw failover from db2115 to db2096 - T317842
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T317842
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2096 with weight 0 T317842', diff saved to https://phabricator.wikimedia.org/P34744 and previous config saved to /var/cache/conftool/dbconfig/20220915-064014-root.json
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T317842
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34743 and previous config saved to /var/cache/conftool/dbconfig/20220915-063538-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 T317839', diff saved to https://phabricator.wikimedia.org/P34742 and previous config saved to /var/cache/conftool/dbconfig/20220915-061421-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 codfw T317839', diff saved to https://phabricator.wikimedia.org/P34741 and previous config saved to /var/cache/conftool/dbconfig/20220915-061317-marostegui.json
  • 06:12 marostegui: Starting s3 codfw failover from db2105 to db2127 - T317839
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 T317839', diff saved to https://phabricator.wikimedia.org/P34740 and previous config saved to /var/cache/conftool/dbconfig/20220915-060307-root.json
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Codfw switchover s3 T317839
  • 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Codfw switchover s3 T317839
  • 05:32 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662
  • 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662
  • 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662

2022-09-14

  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T314041)', diff saved to https://phabricator.wikimedia.org/P34739 and previous config saved to /var/cache/conftool/dbconfig/20220914-220822-ladsgroup.json
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34738 and previous config saved to /var/cache/conftool/dbconfig/20220914-220744-ladsgroup.json
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34737 and previous config saved to /var/cache/conftool/dbconfig/20220914-215238-ladsgroup.json
  • 21:38 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 01m 48s)
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34736 and previous config saved to /var/cache/conftool/dbconfig/20220914-213732-ladsgroup.json
  • 21:37 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
  • 21:36 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
  • 21:35 dduvall@deploy1002: Installation of scap version "4.19.1" completed for 561 hosts
  • 21:35 dduvall@deploy1002: Installing scap version "4.19.1" for 561 hosts
  • 21:34 dduvall: Deploying scap 4.19.1 (https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/832297/1/changelog)
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34735 and previous config saved to /var/cache/conftool/dbconfig/20220914-212225-ladsgroup.json
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:44 dancy@deploy1002: Sync cancelled.
  • 20:44 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:44 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:44 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:40 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:39 dancy@deploy1002: Started scap: testing
  • 20:38 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.1 refs T314190 (duration: 05m 49s)
  • 20:34 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1 refs T314190
  • 20:28 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:24 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:21 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 dancy@deploy1002: deploy-promote aborted: (duration: 08m 52s)
  • 20:19 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.1 refs T314190 (duration: 01m 24s)
  • 20:18 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1 refs T314190
  • 20:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 dancy@deploy1002: Sync cancelled.
  • 20:09 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:09 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:02 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:02 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:02 dancy@deploy1002: Started scap: testing
  • 20:01 TheresNoTime: Nothing to deploy in this UTC late backport window
  • 19:57 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
  • 19:57 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
  • 19:55 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 12s)
  • 19:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:49 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:49 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:49 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:48 dancy@deploy1002: Started scap: testing
  • 19:46 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 23s)
  • 19:46 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:39 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:39 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:39 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:38 dancy@deploy1002: Started scap: testing
  • 19:38 dancy@deploy1002: sync-world aborted: testing (duration: 13m 25s)
  • 19:35 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:26 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
  • 19:24 dancy@deploy1002: Started scap: testing
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling (duration: 02m 03s)
  • 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:15 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.1 refs T314190
  • 18:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki (duration: 02m 05s)
  • 18:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki
  • 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:36 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1 refs T314190 (duration: 04m 41s)
  • 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:31 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1 refs T314190
  • 16:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:10 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 16:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
  • 16:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
  • 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2024.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2024.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2002.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2002.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1026.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1026.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1029.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1029.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1028.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1028.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1027.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1027.eqiad.wmnet on all recursors
  • 15:55 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2001.codfw.wmnet on all recursors
  • 15:55 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2001.codfw.wmnet on all recursors
  • 15:50 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 00m 39s)
  • 15:49 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
  • 15:48 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
  • 15:24 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 15:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:17 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T314041)', diff saved to https://phabricator.wikimedia.org/P34732 and previous config saved to /var/cache/conftool/dbconfig/20220914-145956-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34731 and previous config saved to /var/cache/conftool/dbconfig/20220914-144449-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34730 and previous config saved to /var/cache/conftool/dbconfig/20220914-142941-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T314041)', diff saved to https://phabricator.wikimedia.org/P34729 and previous config saved to /var/cache/conftool/dbconfig/20220914-141434-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
  • 13:59 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
  • 13:48 moritzm: imported zlib 1:1.2.8.dfsg-5+deb9u1+wmf1 to apt.wikimedia.org
  • 13:40 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bnwiktionary --fix # T317745 – dry run result: 6043 links to fix, 6043 were resolvable, 0 were deleted
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move namespace in the Bengali Wiktionary: উইকিসরাস → পরিশিষ্ট and set wgNamespaceAliases for newly created namespaces (T317745) (duration: 03m 41s)
  • 13:28 topranks: upgrading routinator on rpki2002
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content/Section translation on WPs with new MT support from Google (T313296) (duration: 03m 39s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Section Translation in Odia Wikipedia (T313300) (duration: 03m 55s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:54 jayme: imported rsyslog 8.2208.0-1~bpo11+1 into bullseye-wikimedia component/rsyslog-k8s - T289766
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34725 and previous config saved to /var/cache/conftool/dbconfig/20220914-115920-ladsgroup.json
  • 11:49 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-eqdfw,cr2-eqdfw IPv6
  • 11:49 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-eqdfw,cr2-eqdfw IPv6
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34723 and previous config saved to /var/cache/conftool/dbconfig/20220914-114413-ladsgroup.json
  • 11:29 topranks: rebooting cr2-eqdfw to complete upgrade
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34721 and previous config saved to /var/cache/conftool/dbconfig/20220914-112907-ladsgroup.json
  • 11:14 topranks: Shutting down internet transit and peering on cr2-eqdfw in advance of upgrade reboot
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34719 and previous config saved to /var/cache/conftool/dbconfig/20220914-111400-ladsgroup.json
  • 11:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
  • 11:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
  • 11:01 topranks: Prepping to upgrade JunOS on cr2-eqdfw. Adjusting OSPF costs to force traffic via alternate POPs.
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34717 and previous config saved to /var/cache/conftool/dbconfig/20220914-103810-root.json
  • 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:24 kharlan@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: BlockMetrics: Update to new event schema version (T306018) (duration: 03m 48s)
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34715 and previous config saved to /var/cache/conftool/dbconfig/20220914-102305-root.json
  • 10:18 moritzm: import routinator 0.11.3-1bullseye to thirdparty/routinator
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34714 and previous config saved to /var/cache/conftool/dbconfig/20220914-100800-root.json
  • 10:00 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 09:59 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 09:58 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:57 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:57 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 09:53 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 09:53 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34713 and previous config saved to /var/cache/conftool/dbconfig/20220914-095255-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34712 and previous config saved to /var/cache/conftool/dbconfig/20220914-093750-root.json
  • 09:27 moritzm: installing zlib/libxslt security updates on buster
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T314041)', diff saved to https://phabricator.wikimedia.org/P34711 and previous config saved to /var/cache/conftool/dbconfig/20220914-092620-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T314041)', diff saved to https://phabricator.wikimedia.org/P34710 and previous config saved to /var/cache/conftool/dbconfig/20220914-092558-ladsgroup.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34709 and previous config saved to /var/cache/conftool/dbconfig/20220914-092245-root.json
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34708 and previous config saved to /var/cache/conftool/dbconfig/20220914-091052-ladsgroup.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34707 and previous config saved to /var/cache/conftool/dbconfig/20220914-090740-root.json
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 09:05 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34706 and previous config saved to /var/cache/conftool/dbconfig/20220914-085545-ladsgroup.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34705 and previous config saved to /var/cache/conftool/dbconfig/20220914-085235-root.json
  • 08:50 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-test
  • 08:49 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-test
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T314041)', diff saved to https://phabricator.wikimedia.org/P34704 and previous config saved to /var/cache/conftool/dbconfig/20220914-084039-ladsgroup.json
  • 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart on A:wdqs-test
  • 08:38 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart on A:wdqs-test
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
  • 08:32 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old templatelinks columns of enwiki (T312865) (duration: 06m 51s)
  • 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Stop writing to the old templatelinks columns of enwiki (T312865) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:25 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old templatelinks columns of enwiki (T312865)
  • 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1024.eqiad.wmnet with reason: down
  • 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1024.eqiad.wmnet with reason: down
  • 08:02 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es5 T317739 (duration: 03m 38s)
  • 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 T317739', diff saved to https://phabricator.wikimedia.org/P34703 and previous config saved to /var/cache/conftool/dbconfig/20220914-075722-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary T317739', diff saved to https://phabricator.wikimedia.org/P34702 and previous config saved to /var/cache/conftool/dbconfig/20220914-075550-marostegui.json
  • 07:55 marostegui: Starting es5 eqiad failover from es1024 to es1023 T317739
  • 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:50 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es5 T317739 (duration: 04m 13s)
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1023 with weight 0 T317739', diff saved to https://phabricator.wikimedia.org/P34701 and previous config saved to /var/cache/conftool/dbconfig/20220914-074617-marostegui.json
  • 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317739
  • 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317739
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34700 and previous config saved to /var/cache/conftool/dbconfig/20220914-074248-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34699 and previous config saved to /var/cache/conftool/dbconfig/20220914-072743-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34698 and previous config saved to /var/cache/conftool/dbconfig/20220914-071238-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34697 and previous config saved to /var/cache/conftool/dbconfig/20220914-065733-root.json
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34696 and previous config saved to /var/cache/conftool/dbconfig/20220914-064330-ladsgroup.json
  • 06:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T314041)', diff saved to https://phabricator.wikimedia.org/P34695 and previous config saved to /var/cache/conftool/dbconfig/20220914-064309-ladsgroup.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34694 and previous config saved to /var/cache/conftool/dbconfig/20220914-064228-root.json
  • 06:38 elukey: restart kafka on kafka-logging2003 to pick up the new PKI TLS settings
  • 06:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
  • 06:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34693 and previous config saved to /var/cache/conftool/dbconfig/20220914-062802-ladsgroup.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34692 and previous config saved to /var/cache/conftool/dbconfig/20220914-062723-root.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34691 and previous config saved to /var/cache/conftool/dbconfig/20220914-061256-ladsgroup.json
  • 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: down
  • 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: down
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 T317735', diff saved to https://phabricator.wikimedia.org/P34690 and previous config saved to /var/cache/conftool/dbconfig/20220914-060913-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 codfw primary T317735', diff saved to https://phabricator.wikimedia.org/P34689 and previous config saved to /var/cache/conftool/dbconfig/20220914-060807-marostegui.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T314041)', diff saved to https://phabricator.wikimedia.org/P34688 and previous config saved to /var/cache/conftool/dbconfig/20220914-055749-ladsgroup.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 T317735', diff saved to https://phabricator.wikimedia.org/P34687 and previous config saved to /var/cache/conftool/dbconfig/20220914-055156-marostegui.json
  • 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T317735
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T317735
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34686 and previous config saved to /var/cache/conftool/dbconfig/20220914-052510-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 05:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T314041)', diff saved to https://phabricator.wikimedia.org/P34685 and previous config saved to /var/cache/conftool/dbconfig/20220914-052448-ladsgroup.json
  • 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34684 and previous config saved to /var/cache/conftool/dbconfig/20220914-050942-ladsgroup.json
  • 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34683 and previous config saved to /var/cache/conftool/dbconfig/20220914-045435-ladsgroup.json
  • 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T314041)', diff saved to https://phabricator.wikimedia.org/P34682 and previous config saved to /var/cache/conftool/dbconfig/20220914-043929-ladsgroup.json
  • 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T314041)', diff saved to https://phabricator.wikimedia.org/P34681 and previous config saved to /var/cache/conftool/dbconfig/20220914-035624-ladsgroup.json
  • 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34680 and previous config saved to /var/cache/conftool/dbconfig/20220914-035546-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34679 and previous config saved to /var/cache/conftool/dbconfig/20220914-034040-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T314041)', diff saved to https://phabricator.wikimedia.org/P34678 and previous config saved to /var/cache/conftool/dbconfig/20220914-033921-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34677 and previous config saved to /var/cache/conftool/dbconfig/20220914-032533-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34676 and previous config saved to /var/cache/conftool/dbconfig/20220914-032415-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34675 and previous config saved to /var/cache/conftool/dbconfig/20220914-031027-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34674 and previous config saved to /var/cache/conftool/dbconfig/20220914-030908-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T314041)', diff saved to https://phabricator.wikimedia.org/P34673 and previous config saved to /var/cache/conftool/dbconfig/20220914-025402-ladsgroup.json
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T314041)', diff saved to https://phabricator.wikimedia.org/P34672 and previous config saved to /var/cache/conftool/dbconfig/20220914-013204-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34671 and previous config saved to /var/cache/conftool/dbconfig/20220914-013143-ladsgroup.json
  • 01:24 eileen: civicrm upgraded from d91b4a2c to e82d9cd0
  • 01:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34670 and previous config saved to /var/cache/conftool/dbconfig/20220914-011637-ladsgroup.json
  • 01:14 ejegg: disabled delete_deleted_contacts job (will take effect when current job ends)
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34669 and previous config saved to /var/cache/conftool/dbconfig/20220914-010130-ladsgroup.json
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34668 and previous config saved to /var/cache/conftool/dbconfig/20220914-004624-ladsgroup.json

2022-09-13

  • 23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34667 and previous config saved to /var/cache/conftool/dbconfig/20220913-234607-ladsgroup.json
  • 23:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T314041)', diff saved to https://phabricator.wikimedia.org/P34666 and previous config saved to /var/cache/conftool/dbconfig/20220913-234546-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34665 and previous config saved to /var/cache/conftool/dbconfig/20220913-233039-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34664 and previous config saved to /var/cache/conftool/dbconfig/20220913-231533-ladsgroup.json
  • 23:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T314041)', diff saved to https://phabricator.wikimedia.org/P34663 and previous config saved to /var/cache/conftool/dbconfig/20220913-231257-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T314041)', diff saved to https://phabricator.wikimedia.org/P34662 and previous config saved to /var/cache/conftool/dbconfig/20220913-230317-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34661 and previous config saved to /var/cache/conftool/dbconfig/20220913-230255-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T314041)', diff saved to https://phabricator.wikimedia.org/P34660 and previous config saved to /var/cache/conftool/dbconfig/20220913-230026-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34659 and previous config saved to /var/cache/conftool/dbconfig/20220913-225750-ladsgroup.json
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34658 and previous config saved to /var/cache/conftool/dbconfig/20220913-224749-ladsgroup.json
  • 22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34657 and previous config saved to /var/cache/conftool/dbconfig/20220913-224244-ladsgroup.json
  • 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34656 and previous config saved to /var/cache/conftool/dbconfig/20220913-223241-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34655 and previous config saved to /var/cache/conftool/dbconfig/20220913-223025-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T314041)', diff saved to https://phabricator.wikimedia.org/P34654 and previous config saved to /var/cache/conftool/dbconfig/20220913-222738-ladsgroup.json
  • 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34653 and previous config saved to /var/cache/conftool/dbconfig/20220913-221734-ladsgroup.json
  • 22:16 dancy: dancy@deploy1002$ rm /var/lib/deploy-mwdebug/pause
  • 22:15 dancy@deploy1002: Sync cancelled.
  • 22:15 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:13 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:12 dancy@deploy1002: Started scap: testing
  • 22:12 dancy@deploy1002: Sync cancelled.
  • 22:11 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 22:11 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:11 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:10 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:08 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:07 dancy@deploy1002: Started scap: testing
  • 22:07 dancy@deploy1002: Sync cancelled.
  • 22:07 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:06 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:05 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:03 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:02 dancy@deploy1002: Started scap: testing
  • 22:01 dancy@deploy1002: Sync cancelled.
  • 22:01 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:01 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:58 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:58 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:55 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:55 dancy@deploy1002: Started scap: testing
  • 21:55 dancy@deploy1002: Sync cancelled.
  • 21:54 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:54 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:50 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:50 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:48 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:47 dancy@deploy1002: Started scap: testing
  • 21:37 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ MW_CONFIG_BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restric
  • 21:36 dancy@deploy1002: Started scap: testing
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:16 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
  • 21:16 dancy@deploy1002: Sync cancelled.
  • 21:15 dancy@deploy1002: dancy: testing T299648 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:10 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:04 dancy@deploy1002: Started scap: testing T299648
  • 20:25 cjming: end of UTC late backport window
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 cjming@deploy1002: Finished scap: Backport for add tagline and update wordmark in ptwikinews (T313174) (duration: 05m 50s)
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 cjming@deploy1002: cjming and aishik: Backport for add tagline and update wordmark in ptwikinews (T313174) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:09 cjming@deploy1002: Started scap: Backport for add tagline and update wordmark in ptwikinews (T313174)
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T314041)', diff saved to https://phabricator.wikimedia.org/P34652 and previous config saved to /var/cache/conftool/dbconfig/20220913-200344-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T314041)', diff saved to https://phabricator.wikimedia.org/P34651 and previous config saved to /var/cache/conftool/dbconfig/20220913-200214-ladsgroup.json
  • 20:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T314041)', diff saved to https://phabricator.wikimedia.org/P34650 and previous config saved to /var/cache/conftool/dbconfig/20220913-200152-ladsgroup.json
  • 19:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34649 and previous config saved to /var/cache/conftool/dbconfig/20220913-194645-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34648 and previous config saved to /var/cache/conftool/dbconfig/20220913-193139-ladsgroup.json
  • 19:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T314041)', diff saved to https://phabricator.wikimedia.org/P34647 and previous config saved to /var/cache/conftool/dbconfig/20220913-191632-ladsgroup.json
  • 19:01 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 18:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34646 and previous config saved to /var/cache/conftool/dbconfig/20220913-183259-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34645 and previous config saved to /var/cache/conftool/dbconfig/20220913-183238-ladsgroup.json
  • 18:31 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: InitialiseSettings-labs.php: Set $wgPhonosPath (T317417) (duration: 03m 45s)
  • 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:28 TheresNoTime: deploying a beta cluster only config change, T317417
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34644 and previous config saved to /var/cache/conftool/dbconfig/20220913-181731-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34643 and previous config saved to /var/cache/conftool/dbconfig/20220913-180225-ladsgroup.json
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34642 and previous config saved to /var/cache/conftool/dbconfig/20220913-174718-ladsgroup.json
  • 17:43 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
  • 17:41 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
  • 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34640 and previous config saved to /var/cache/conftool/dbconfig/20220913-173721-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T312863)', diff saved to https://phabricator.wikimedia.org/P34639 and previous config saved to /var/cache/conftool/dbconfig/20220913-173254-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34638 and previous config saved to /var/cache/conftool/dbconfig/20220913-172215-ladsgroup.json
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34637 and previous config saved to /var/cache/conftool/dbconfig/20220913-171747-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34636 and previous config saved to /var/cache/conftool/dbconfig/20220913-170708-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34635 and previous config saved to /var/cache/conftool/dbconfig/20220913-170241-ladsgroup.json
  • 16:56 ejegg: updated fundraising CiviCRM from efbbcb57 to d91b4a2c
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34634 and previous config saved to /var/cache/conftool/dbconfig/20220913-165202-ladsgroup.json
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T314041)', diff saved to https://phabricator.wikimedia.org/P34633 and previous config saved to /var/cache/conftool/dbconfig/20220913-165117-ladsgroup.json
  • 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T314041)', diff saved to https://phabricator.wikimedia.org/P34632 and previous config saved to /var/cache/conftool/dbconfig/20220913-165056-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T312863)', diff saved to https://phabricator.wikimedia.org/P34631 and previous config saved to /var/cache/conftool/dbconfig/20220913-164734-ladsgroup.json
  • 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34630 and previous config saved to /var/cache/conftool/dbconfig/20220913-163549-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34629 and previous config saved to /var/cache/conftool/dbconfig/20220913-162043-ladsgroup.json
  • 16:13 godog: add 200G to prometheus/eqiad instance ops
  • 16:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T314041)', diff saved to https://phabricator.wikimedia.org/P34628 and previous config saved to /var/cache/conftool/dbconfig/20220913-160536-ladsgroup.json
  • 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34626 and previous config saved to /var/cache/conftool/dbconfig/20220913-154810-root.json
  • 15:42 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis (duration: 02m 07s)
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P34625 and previous config saved to /var/cache/conftool/dbconfig/20220913-154151-ladsgroup.json
  • 15:40 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis
  • 15:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34624 and previous config saved to /var/cache/conftool/dbconfig/20220913-152644-ladsgroup.json
  • 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:14 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28 refs T314190 (duration: 04m 31s)
  • 15:13 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
  • 15:12 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34623 and previous config saved to /var/cache/conftool/dbconfig/20220913-151138-ladsgroup.json
  • 15:10 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28 refs T314190
  • 15:08 dancy@deploy1002: deploy-promote aborted: (duration: 00m 02s)
  • 14:59 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1 refs T314190 (duration: 04m 43s)
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P34622 and previous config saved to /var/cache/conftool/dbconfig/20220913-145631-ladsgroup.json
  • 14:54 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1 refs T314190
  • 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:47 dancy@deploy1002: deploy-promote aborted: (duration: 01m 03s)
  • 14:47 dancy@deploy1002: prep aborted: (duration: 00m 12s)
  • 14:46 moritzm: restarting FPM/Apache on mediawiki canaries
  • 14:44 moritzm: installing libxslt security updates on buster
  • 14:18 topranks: Core router upgrade in codfw complete - maintenance closed.
  • 14:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
  • 14:12 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
  • 14:07 topranks: re-activating Transit on IX BGP on cr2-codfw
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34621 and previous config saved to /var/cache/conftool/dbconfig/20220913-135729-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T314041)', diff saved to https://phabricator.wikimedia.org/P34620 and previous config saved to /var/cache/conftool/dbconfig/20220913-135707-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34619 and previous config saved to /var/cache/conftool/dbconfig/20220913-134201-ladsgroup.json
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T314041)', diff saved to https://phabricator.wikimedia.org/P34618 and previous config saved to /var/cache/conftool/dbconfig/20220913-133339-ladsgroup.json
  • 13:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T314041)', diff saved to https://phabricator.wikimedia.org/P34617 and previous config saved to /var/cache/conftool/dbconfig/20220913-133317-ladsgroup.json
  • 13:33 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34616 and previous config saved to /var/cache/conftool/dbconfig/20220913-132654-ladsgroup.json
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: testwiki: Add mediawiki.edit_attempt stream (T309013) (2/2) (duration: 03m 33s)
  • 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: Add mediawiki.edit_attempt stream (T309013) (1/2) (duration: 03m 39s)
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 Emperor: set thanos ring replicas to 3.85 T311690
  • 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34615 and previous config saved to /var/cache/conftool/dbconfig/20220913-131811-ladsgroup.json
  • 13:14 topranks: Flipping back to RE0 on cr2-codfw (last disruptive switch)
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wgWMESearchRelevancePages (unused) (duration: 03m 53s)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T314041)', diff saved to https://phabricator.wikimedia.org/P34614 and previous config saved to /var/cache/conftool/dbconfig/20220913-131148-ladsgroup.json
  • 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34613 and previous config saved to /var/cache/conftool/dbconfig/20220913-130304-ladsgroup.json
  • 12:59 topranks: Switching active RE back to RE1 on cr1-codfw as firmware hadn't been loaded while it was master
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T314041)', diff saved to https://phabricator.wikimedia.org/P34612 and previous config saved to /var/cache/conftool/dbconfig/20220913-125745-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T314041)', diff saved to https://phabricator.wikimedia.org/P34611 and previous config saved to /var/cache/conftool/dbconfig/20220913-125723-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T314041)', diff saved to https://phabricator.wikimedia.org/P34610 and previous config saved to /var/cache/conftool/dbconfig/20220913-124758-ladsgroup.json
  • 12:46 topranks: forcing non-graceful RE switchover on cr2-codfw as part of upgrade
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34609 and previous config saved to /var/cache/conftool/dbconfig/20220913-124217-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34608 and previous config saved to /var/cache/conftool/dbconfig/20220913-122710-ladsgroup.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34607 and previous config saved to /var/cache/conftool/dbconfig/20220913-122415-root.json
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T314041)', diff saved to https://phabricator.wikimedia.org/P34606 and previous config saved to /var/cache/conftool/dbconfig/20220913-121204-ladsgroup.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34605 and previous config saved to /var/cache/conftool/dbconfig/20220913-120910-root.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34604 and previous config saved to /var/cache/conftool/dbconfig/20220913-120653-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T314041)', diff saved to https://phabricator.wikimedia.org/P34603 and previous config saved to /var/cache/conftool/dbconfig/20220913-120632-ladsgroup.json
  • 11:58 topranks: Disabling transit and ixp BGP on cr2-codfw in advance of software upgrade
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34602 and previous config saved to /var/cache/conftool/dbconfig/20220913-115405-root.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34601 and previous config saved to /var/cache/conftool/dbconfig/20220913-115125-ladsgroup.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34600 and previous config saved to /var/cache/conftool/dbconfig/20220913-113900-root.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34599 and previous config saved to /var/cache/conftool/dbconfig/20220913-113619-ladsgroup.json
  • 11:34 hashar: Upgrading CI Jenkins T317418
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P34598 and previous config saved to /var/cache/conftool/dbconfig/20220913-112818-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34597 and previous config saved to /var/cache/conftool/dbconfig/20220913-112355-root.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T314041)', diff saved to https://phabricator.wikimedia.org/P34596 and previous config saved to /var/cache/conftool/dbconfig/20220913-112112-ladsgroup.json
  • 11:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
  • 11:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
  • 11:15 topranks: completed cr1-codfw upgrade, will proceed to cr2-codfw shortly
  • 11:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
  • 11:14 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
  • 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:09 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.1/includes/libs/rdbms/ChronologyProtector.php: Backport: rdbms: Bump ChronologyProtector cache key version (T317606) (duration: 03m 49s)
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34595 and previous config saved to /var/cache/conftool/dbconfig/20220913-110850-root.json
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T312863)', diff saved to https://phabricator.wikimedia.org/P34594 and previous config saved to /var/cache/conftool/dbconfig/20220913-110755-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34593 and previous config saved to /var/cache/conftool/dbconfig/20220913-110715-root.json
  • 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T317627', diff saved to https://phabricator.wikimedia.org/P34592 and previous config saved to /var/cache/conftool/dbconfig/20220913-105733-root.json
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 codfw primary T317627', diff saved to https://phabricator.wikimedia.org/P34591 and previous config saved to /var/cache/conftool/dbconfig/20220913-105642-marostegui.json
  • 10:56 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:56 marostegui: Starting s2 codfw failover from db2104 to db2107 - T317627
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34590 and previous config saved to /var/cache/conftool/dbconfig/20220913-105210-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34589 and previous config saved to /var/cache/conftool/dbconfig/20220913-103705-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 from api T317627', diff saved to https://phabricator.wikimedia.org/P34588 and previous config saved to /var/cache/conftool/dbconfig/20220913-103658-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 T317627', diff saved to https://phabricator.wikimedia.org/P34587 and previous config saved to /var/cache/conftool/dbconfig/20220913-103621-marostegui.json
  • 10:35 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T317627
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T317627
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T314041)', diff saved to https://phabricator.wikimedia.org/P34586 and previous config saved to /var/cache/conftool/dbconfig/20220913-102232-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34585 and previous config saved to /var/cache/conftool/dbconfig/20220913-102147-root.json
  • 10:16 topranks: Flipping master RE on cr1-codfw to backup as part of upgrade
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34584 and previous config saved to /var/cache/conftool/dbconfig/20220913-100642-root.json
  • 10:04 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 09:52 elukey: restart kafka on kafka-logging2002 to move it to PKI-based TLS certs
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34583 and previous config saved to /var/cache/conftool/dbconfig/20220913-095137-root.json
  • 09:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
  • 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
  • 09:45 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 09:41 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 09:37 hashar: Restarting CI Jenkins on contint2001 (with new systemd service)
  • 09:33 hashar: Enabling Puppet on contint2001 for Jenkins systemd change
  • 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T314041)', diff saved to https://phabricator.wikimedia.org/P34582 and previous config saved to /var/cache/conftool/dbconfig/20220913-092904-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T314041)', diff saved to https://phabricator.wikimedia.org/P34581 and previous config saved to /var/cache/conftool/dbconfig/20220913-092826-ladsgroup.json
  • 09:25 hashar: Stopped Puppet on contint2001 for a Jenkins systemd change
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T317614', diff saved to https://phabricator.wikimedia.org/P34580 and previous config saved to /var/cache/conftool/dbconfig/20220913-092200-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary T317614', diff saved to https://phabricator.wikimedia.org/P34579 and previous config saved to /var/cache/conftool/dbconfig/20220913-092032-root.json
  • 09:19 marostegui: Starting s1 codfw failover from db2103 to db2112 - T317614
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34578 and previous config saved to /var/cache/conftool/dbconfig/20220913-091320-ladsgroup.json
  • 09:11 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 09:11 volans@cumin1001: START - Cookbook sre.network.cf
  • 09:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
  • 09:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34577 and previous config saved to /var/cache/conftool/dbconfig/20220913-085814-ladsgroup.json
  • 08:56 topranks: Flipping primary routing engine to RE1 on cr1-codfw (disruptive) as part of upgrade.
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 T317614', diff saved to https://phabricator.wikimedia.org/P34576 and previous config saved to /var/cache/conftool/dbconfig/20220913-085456-marostegui.json
  • 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 T317614
  • 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 T317614
  • 08:46 topranks: Disabled LVS/PyBal peerings on cr1-codfw ain advance of upgrade to router.
  • 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T314041)', diff saved to https://phabricator.wikimedia.org/P34575 and previous config saved to /var/cache/conftool/dbconfig/20220913-084307-ladsgroup.json
  • 08:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 08:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 08:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 08:27 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:27 cmooney@cumin1001: START - Cookbook sre.network.cf
  • 08:17 moritzm: roll-restarting apache/FPM on mw canaries to pick up zlib security updates
  • 08:15 topranks: de-pooling codfw ahead of core router upgrades at the site
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28 refs T314190 (duration: 04m 29s)
  • 07:14 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28 refs T314190
  • 07:11 jhuneidi@deploy1002: deploy-promote aborted: (duration: 00m 09s)
  • 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34574 and previous config saved to /var/cache/conftool/dbconfig/20220913-065457-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34573 and previous config saved to /var/cache/conftool/dbconfig/20220913-063951-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T314041)', diff saved to https://phabricator.wikimedia.org/P34572 and previous config saved to /var/cache/conftool/dbconfig/20220913-063908-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34571 and previous config saved to /var/cache/conftool/dbconfig/20220913-062444-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34570 and previous config saved to /var/cache/conftool/dbconfig/20220913-060938-ladsgroup.json
  • 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T314041)', diff saved to https://phabricator.wikimedia.org/P34569 and previous config saved to /var/cache/conftool/dbconfig/20220913-045832-ladsgroup.json
  • 04:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 04:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T314041)', diff saved to https://phabricator.wikimedia.org/P34568 and previous config saved to /var/cache/conftool/dbconfig/20220913-045811-ladsgroup.json
  • 04:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34567 and previous config saved to /var/cache/conftool/dbconfig/20220913-044304-ladsgroup.json
  • 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34566 and previous config saved to /var/cache/conftool/dbconfig/20220913-042758-ladsgroup.json
  • 04:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T314041)', diff saved to https://phabricator.wikimedia.org/P34565 and previous config saved to /var/cache/conftool/dbconfig/20220913-041251-ladsgroup.json
  • 04:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.27 (duration: 01m 59s)
  • 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1 refs T314190 (duration: 35m 37s)
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1 refs T314190
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34564 and previous config saved to /var/cache/conftool/dbconfig/20220913-022136-ladsgroup.json
  • 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T314041)', diff saved to https://phabricator.wikimedia.org/P34563 and previous config saved to /var/cache/conftool/dbconfig/20220913-022114-ladsgroup.json
  • 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34562 and previous config saved to /var/cache/conftool/dbconfig/20220913-020608-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34561 and previous config saved to /var/cache/conftool/dbconfig/20220913-015102-ladsgroup.json
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T314041)', diff saved to https://phabricator.wikimedia.org/P34560 and previous config saved to /var/cache/conftool/dbconfig/20220913-013555-ladsgroup.json
  • 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
  • 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
  • 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
  • 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
  • 00:48 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
  • 00:48 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T314041)', diff saved to https://phabricator.wikimedia.org/P34559 and previous config saved to /var/cache/conftool/dbconfig/20220913-001908-ladsgroup.json
  • 00:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T314041)', diff saved to https://phabricator.wikimedia.org/P34558 and previous config saved to /var/cache/conftool/dbconfig/20220913-001846-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34557 and previous config saved to /var/cache/conftool/dbconfig/20220913-000340-ladsgroup.json

2022-09-12

  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34556 and previous config saved to /var/cache/conftool/dbconfig/20220912-234833-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T314041)', diff saved to https://phabricator.wikimedia.org/P34555 and previous config saved to /var/cache/conftool/dbconfig/20220912-233327-ladsgroup.json
  • 22:53 mutante: phabricator - disabling MediaWiki extension repositories in Diffusion that have 0 commits - T296022 - T315706
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T314041)', diff saved to https://phabricator.wikimedia.org/P34554 and previous config saved to /var/cache/conftool/dbconfig/20220912-224006-ladsgroup.json
  • 22:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T314041)', diff saved to https://phabricator.wikimedia.org/P34553 and previous config saved to /var/cache/conftool/dbconfig/20220912-223927-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34552 and previous config saved to /var/cache/conftool/dbconfig/20220912-222420-ladsgroup.json
  • 22:23 mutante: phabricator - disabling repositories: tool-xh-bot, tool-editor-contribution-dashboard, tool-ranker, tool-editor-contribution, tool-mikasa-bot-1, tool-maintun, tool-add-text, tool-wikibookassamese-book.php (none of them had commits) T296022 - T315706
  • 22:20 mutante: phabricator - disabling repository "tool-ranker"
  • 22:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34551 and previous config saved to /var/cache/conftool/dbconfig/20220912-220914-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T314041)', diff saved to https://phabricator.wikimedia.org/P34550 and previous config saved to /var/cache/conftool/dbconfig/20220912-215407-ladsgroup.json
  • 21:07 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 19s)
  • 21:07 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T312863)', diff saved to https://phabricator.wikimedia.org/P34549 and previous config saved to /var/cache/conftool/dbconfig/20220912-210123-ladsgroup.json
  • 20:57 TheresNoTime: closing UTC late backport window
  • 20:56 samtar@deploy1002: Finished scap: Backport for Set track_total_hits to true (duration: 05m 00s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 samtar@deploy1002: samtar and ebernhardson: Backport for Set track_total_hits to true synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:51 samtar@deploy1002: Started scap: Backport for Set track_total_hits to true
  • 20:49 samtar@deploy1002: Finished scap: Backport for Enable Nearby on Hebrew and French Wikipedia (T246493) (duration: 07m 27s)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34548 and previous config saved to /var/cache/conftool/dbconfig/20220912-204617-ladsgroup.json
  • 20:42 samtar@deploy1002: samtar and jdlrobson: Backport for Enable Nearby on Hebrew and French Wikipedia (T246493) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:42 samtar@deploy1002: Started scap: Backport for Enable Nearby on Hebrew and French Wikipedia (T246493)
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:40 samtar@deploy1002: Finished scap: Backport for Deploy Research Incentive Survey to idwiki (T316466) (duration: 06m 25s)
  • 20:39 jhathaway: testing exim config change on mx1001.wikimedia.org
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:38 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dispatch-be1001.eqiad.wmnet
  • 20:34 samtar@deploy1002: samtar and dani: Backport for Deploy Research Incentive Survey to idwiki (T316466) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:34 samtar@deploy1002: Started scap: Backport for Deploy Research Incentive Survey to idwiki (T316466)
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 samtar@deploy1002: Finished scap: Backport for Re-enable track_total_hits for elastic7 (T317374) (duration: 06m 12s)
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34547 and previous config saved to /var/cache/conftool/dbconfig/20220912-203110-ladsgroup.json
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:26 samtar@deploy1002: samtar and ebernhardson: Backport for Re-enable track_total_hits for elastic7 (T317374) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:26 samtar@deploy1002: Started scap: Backport for Re-enable track_total_hits for elastic7 (T317374)
  • 20:24 samtar@deploy1002: Finished scap: Backport for Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424) (duration: 08m 14s)
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T314041)', diff saved to https://phabricator.wikimedia.org/P34546 and previous config saved to /var/cache/conftool/dbconfig/20220912-202359-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 samtar@deploy1002: samtar and aishik: Backport for Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:16 samtar@deploy1002: Started scap: Backport for Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T312863)', diff saved to https://phabricator.wikimedia.org/P34545 and previous config saved to /var/cache/conftool/dbconfig/20220912-201604-ladsgroup.json
  • 20:15 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dispatch-be1001.eqiad.wmnet on all recursors
  • 20:14 herron@cumin1001: START - Cookbook sre.dns.wipe-cache dispatch-be1001.eqiad.wmnet on all recursors
  • 20:14 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 samtar@deploy1002: Finished scap: Backport for Mark spcomwiki and searchcomwiki as closed (T285685) (duration: 05m 40s)
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:12 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:12 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host dispatch-be1001.eqiad.wmnet
  • 20:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:09 samtar@deploy1002: samtar and zabe: Backport for Mark spcomwiki and searchcomwiki as closed (T285685) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:08 samtar@deploy1002: Started scap: Backport for Mark spcomwiki and searchcomwiki as closed (T285685)
  • 20:07 samtar@deploy1002: backport aborted: (duration: 03m 46s)
  • 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts theemin.codfw.wmnet
  • 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:59 maryum: deployed security patch for T314245
  • 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.decommission for hosts theemin.codfw.wmnet
  • 19:58 mstyles@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/PageTriage/includes/Api/ApiPageTriageAction.php: (no justification provided) (duration: 03m 42s)
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T314041)', diff saved to https://phabricator.wikimedia.org/P34544 and previous config saved to /var/cache/conftool/dbconfig/20220912-195540-ladsgroup.json
  • 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34543 and previous config saved to /var/cache/conftool/dbconfig/20220912-195519-ladsgroup.json
  • 19:53 sbassett: Deployed security patch for T311337
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34542 and previous config saved to /var/cache/conftool/dbconfig/20220912-194013-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T314041)', diff saved to https://phabricator.wikimedia.org/P34541 and previous config saved to /var/cache/conftool/dbconfig/20220912-192858-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34540 and previous config saved to /var/cache/conftool/dbconfig/20220912-192837-ladsgroup.json
  • 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 02m 04s)
  • 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34539 and previous config saved to /var/cache/conftool/dbconfig/20220912-192506-ladsgroup.json
  • 19:20 dancy@deploy1002: Installation of scap version "4.19.0" completed for 561 hosts
  • 19:20 dancy@deploy1002: Installing scap version "4.19.0" for 561 hosts
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34538 and previous config saved to /var/cache/conftool/dbconfig/20220912-191330-ladsgroup.json
  • 19:12 dancy@deploy1002: Installation of scap version "4.18.0" completed for 561 hosts
  • 19:12 dancy@deploy1002: Installing scap version "4.18.0" for 561 hosts
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34537 and previous config saved to /var/cache/conftool/dbconfig/20220912-191000-ladsgroup.json
  • 19:08 inflatador: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 19:04 ryankemper: [WCQS] Depooled `wcqs100[1,2]` while they catch up on ~1.5 days worth of lag (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wcqs&viewPanel=8&from=1662910789183&to=1663068616559)
  • 19:00 inflatador: [WCQS Deploy] Test query passed on commons-query.wikimedia.org; WCQS deploy complete
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34536 and previous config saved to /var/cache/conftool/dbconfig/20220912-185823-ladsgroup.json
  • 18:56 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14] (wcqs): Deploy 0.3.116 to WCQS (duration: 08m 01s)
  • 18:48 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14] (wcqs): Deploy 0.3.116 to WCQS
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34535 and previous config saved to /var/cache/conftool/dbconfig/20220912-184317-ladsgroup.json
  • 18:37 inflatador: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 18:37 inflatador: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 18:37 inflatador: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 18:22 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 07m 31s)
  • 18:14 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
  • 18:14 dancy@deploy1002: Installation of scap version "4.16.0" completed for 561 hosts
  • 18:13 dancy@deploy1002: Installing scap version "4.16.0" for 561 hosts
  • 18:08 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 05m 37s)
  • 18:02 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
  • 18:01 inflatador: [WDQS Deploy] Tests passing following deploy of `wdqs1003` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:57 inflatador: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.116`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34532 and previous config saved to /var/cache/conftool/dbconfig/20220912-174301-ladsgroup.json
  • 17:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 17:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34531 and previous config saved to /var/cache/conftool/dbconfig/20220912-174239-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P34529 and previous config saved to /var/cache/conftool/dbconfig/20220912-172733-ladsgroup.json
  • 17:21 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 17:21 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P34528 and previous config saved to /var/cache/conftool/dbconfig/20220912-171227-ladsgroup.json
  • 17:08 cwhite: rebuilt raid on logstash2027 T316996
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34527 and previous config saved to /var/cache/conftool/dbconfig/20220912-165720-ladsgroup.json
  • 15:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 15:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34524 and previous config saved to /var/cache/conftool/dbconfig/20220912-152920-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T314041)', diff saved to https://phabricator.wikimedia.org/P34523 and previous config saved to /var/cache/conftool/dbconfig/20220912-152858-ladsgroup.json
  • 15:18 dancy@deploy1002: Installation of scap version "4.18.0" completed for 561 hosts
  • 15:17 dancy@deploy1002: Installing scap version "4.18.0" for 561 hosts
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P34522 and previous config saved to /var/cache/conftool/dbconfig/20220912-151352-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P34521 and previous config saved to /var/cache/conftool/dbconfig/20220912-145845-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T314041)', diff saved to https://phabricator.wikimedia.org/P34520 and previous config saved to /var/cache/conftool/dbconfig/20220912-144339-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T312863)', diff saved to https://phabricator.wikimedia.org/P34519 and previous config saved to /var/cache/conftool/dbconfig/20220912-141427-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T312863)', diff saved to https://phabricator.wikimedia.org/P34518 and previous config saved to /var/cache/conftool/dbconfig/20220912-141405-ladsgroup.json
  • 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wtp[1028-1030]
  • 14:02 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P34517 and previous config saved to /var/cache/conftool/dbconfig/20220912-135859-ladsgroup.json
  • 13:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 13:53 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp[1028-1030]
  • 13:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 13:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P34516 and previous config saved to /var/cache/conftool/dbconfig/20220912-134353-ladsgroup.json
  • 13:43 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:40 Lucas_WMDE: scap pull on mwdebug1001 to restore good code (confirmed that T317520 affects production)
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34515 and previous config saved to /var/cache/conftool/dbconfig/20220912-133848-root.json
  • 13:35 Lucas_WMDE: manually applying gerrit:830691 on mwdebug1001 to test if T317520 affects production (expected to cause getExpensiveParserFunctionLimit-related logstash errors)
  • 13:33 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/%s\n' {mobile/copyright/wikipedia-ko-600k.svg,project-logos/kowiki-600k{,-1.5x,-2x}.png} | mwscript purgeList.php # T315127
  • 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/mobile/copyright/: Config: Revert "kowiki: Add logo (legacy vector and vector-2022) for 600k articles" (T315127) (2/2; deleted file requires syncing whole directory) (duration: 03m 44s)
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T312863)', diff saved to https://phabricator.wikimedia.org/P34514 and previous config saved to /var/cache/conftool/dbconfig/20220912-132846-ladsgroup.json
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/project-logos/: Config: Revert "kowiki: Add logo (legacy vector and vector-2022) for 600k articles" (T315127) (1/2; deleted files require syncing whole directory) (duration: 03m 50s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34513 and previous config saved to /var/cache/conftool/dbconfig/20220912-132343-root.json
  • 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "kowiki: Change logo for 600k articles" (T315127) (3/3) (duration: 03m 53s)
  • 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/logos.php: Config: Revert "kowiki: Change logo for 600k articles" (T315127) (2/3) (duration: 03m 39s)
  • 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 13:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized logos/config.yaml: Config: Revert "kowiki: Change logo for 600k articles" (T315127) (1/3) (duration: 03m 53s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maint
  • 13:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maint
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34512 and previous config saved to /var/cache/conftool/dbconfig/20220912-130838-root.json
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34511 and previous config saved to /var/cache/conftool/dbconfig/20220912-125333-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34510 and previous config saved to /var/cache/conftool/dbconfig/20220912-123828-root.json
  • 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34509 and previous config saved to /var/cache/conftool/dbconfig/20220912-122654-root.json
  • 12:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34508 and previous config saved to /var/cache/conftool/dbconfig/20220912-122323-root.json
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34507 and previous config saved to /var/cache/conftool/dbconfig/20220912-122242-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34506 and previous config saved to /var/cache/conftool/dbconfig/20220912-122221-ladsgroup.json
  • 12:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34505 and previous config saved to /var/cache/conftool/dbconfig/20220912-121150-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34504 and previous config saved to /var/cache/conftool/dbconfig/20220912-120818-root.json
  • 12:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 12:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1146.eqiad.wmnet
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P34503 and previous config saved to /var/cache/conftool/dbconfig/20220912-120715-ladsgroup.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34502 and previous config saved to /var/cache/conftool/dbconfig/20220912-115645-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34501 and previous config saved to /var/cache/conftool/dbconfig/20220912-115313-root.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P34500 and previous config saved to /var/cache/conftool/dbconfig/20220912-115208-ladsgroup.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34499 and previous config saved to /var/cache/conftool/dbconfig/20220912-114140-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34498 and previous config saved to /var/cache/conftool/dbconfig/20220912-113808-root.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34497 and previous config saved to /var/cache/conftool/dbconfig/20220912-113702-ladsgroup.json
  • 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:28 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es4 T317522 (duration: 03m 36s)
  • 11:27 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 11:27 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34496 and previous config saved to /var/cache/conftool/dbconfig/20220912-112635-root.json
  • 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 T317522', diff saved to https://phabricator.wikimedia.org/P34495 and previous config saved to /var/cache/conftool/dbconfig/20220912-112343-root.json
  • 11:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1146.eqiad.wmnet
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary T317522', diff saved to https://phabricator.wikimedia.org/P34494 and previous config saved to /var/cache/conftool/dbconfig/20220912-112039-root.json
  • 11:20 marostegui: Starting es4 eqiad failover from es1021 to es1020 - T317522
  • 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:18 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es4 T317522 (duration: 04m 10s)
  • 11:16 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1143-1148].eqiad.wmnet
  • 11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-etcd1001.eqiad.wmnet
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34493 and previous config saved to /var/cache/conftool/dbconfig/20220912-111442-root.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 T317522', diff saved to https://phabricator.wikimedia.org/P34492 and previous config saved to /var/cache/conftool/dbconfig/20220912-111424-root.json
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317522
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317522
  • 11:12 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1143-1148].eqiad.wmnet
  • 11:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-etcd1001.eqiad.wmnet
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34491 and previous config saved to /var/cache/conftool/dbconfig/20220912-111130-root.json
  • 11:10 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1142.eqiad.wmnet
  • 11:09 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1142.eqiad.wmnet
  • 11:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 11:04 moritzm: updated bullseye install image for 11.5 release T317416
  • 10:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34490 and previous config saved to /var/cache/conftool/dbconfig/20220912-105937-root.json
  • 10:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T314041)', diff saved to https://phabricator.wikimedia.org/P34489 and previous config saved to /var/cache/conftool/dbconfig/20220912-105841-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34488 and previous config saved to /var/cache/conftool/dbconfig/20220912-105625-root.json
  • 10:55 topranks: re-pooliong esams after successful upgrade of core router cr3-esams T295690
  • 10:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34487 and previous config saved to /var/cache/conftool/dbconfig/20220912-104432-root.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34486 and previous config saved to /var/cache/conftool/dbconfig/20220912-104120-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034', diff saved to https://phabricator.wikimedia.org/P34485 and previous config saved to /var/cache/conftool/dbconfig/20220912-103428-root.json
  • 10:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34484 and previous config saved to /var/cache/conftool/dbconfig/20220912-102928-root.json
  • 10:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 10:23 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 37s)
  • 10:22 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34483 and previous config saved to /var/cache/conftool/dbconfig/20220912-101842-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34481 and previous config saved to /var/cache/conftool/dbconfig/20220912-101423-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34480 and previous config saved to /var/cache/conftool/dbconfig/20220912-100337-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34479 and previous config saved to /var/cache/conftool/dbconfig/20220912-095918-root.json
  • 09:55 Emperor: rebalance thanos rings T311690
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34478 and previous config saved to /var/cache/conftool/dbconfig/20220912-094832-root.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033', diff saved to https://phabricator.wikimedia.org/P34477 and previous config saved to /var/cache/conftool/dbconfig/20220912-094818-root.json
  • 09:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 09:45 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34476 and previous config saved to /var/cache/conftool/dbconfig/20220912-093327-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34475 and previous config saved to /var/cache/conftool/dbconfig/20220912-093318-root.json
  • 09:31 moritzm: updated buster install image for 10.13 release T317413
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34474 and previous config saved to /var/cache/conftool/dbconfig/20220912-092244-root.json
  • 09:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:18 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34473 and previous config saved to /var/cache/conftool/dbconfig/20220912-091822-root.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34472 and previous config saved to /var/cache/conftool/dbconfig/20220912-091813-root.json
  • 09:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 09:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34471 and previous config saved to /var/cache/conftool/dbconfig/20220912-090739-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34470 and previous config saved to /var/cache/conftool/dbconfig/20220912-090317-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34469 and previous config saved to /var/cache/conftool/dbconfig/20220912-090308-root.json
  • 08:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34468 and previous config saved to /var/cache/conftool/dbconfig/20220912-085234-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34467 and previous config saved to /var/cache/conftool/dbconfig/20220912-084812-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34466 and previous config saved to /var/cache/conftool/dbconfig/20220912-084803-root.json
  • 08:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 08:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 08:39 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 08:39 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 08:38 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34465 and previous config saved to /var/cache/conftool/dbconfig/20220912-083729-root.json
  • 08:36 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr3-esams,cr3-esams IPv6,cr3-esams.mgmt with reason: router upgrade
  • 08:36 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams,cr3-esams IPv6,cr3-esams.mgmt with reason: router upgrade
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34464 and previous config saved to /var/cache/conftool/dbconfig/20220912-083308-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34463 and previous config saved to /var/cache/conftool/dbconfig/20220912-083258-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34462 and previous config saved to /var/cache/conftool/dbconfig/20220912-082224-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032', diff saved to https://phabricator.wikimedia.org/P34461 and previous config saved to /var/cache/conftool/dbconfig/20220912-081936-root.json
  • 08:17 moritzm: imported jenkins 2.361.1 to thirdparty/ci T317418
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34460 and previous config saved to /var/cache/conftool/dbconfig/20220912-081754-root.json
  • 08:09 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:08 cmooney@cumin1001: START - Cookbook sre.network.cf
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34459 and previous config saved to /var/cache/conftool/dbconfig/20220912-080719-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 T317508', diff saved to https://phabricator.wikimedia.org/P34458 and previous config saved to /var/cache/conftool/dbconfig/20220912-080602-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2024 to es5 codfw primary T317508', diff saved to https://phabricator.wikimedia.org/P34457 and previous config saved to /var/cache/conftool/dbconfig/20220912-080400-root.json
  • 08:03 marostegui: Starting es5 codfw failover from es2023 to es2024 - T317508
  • 08:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-esams with reason: router upgrade
  • 08:01 elukey: restart kafka on kafka2001 to pick up new PKI settings
  • 08:01 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams with reason: router upgrade
  • 08:00 hashar: Restarting CI Jenkins for upgrade T317418
  • 08:00 topranks: de-pooliong esams in advance of upgrade to core router cr3-esams T295690
  • 07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2001.codfw.wmnet with reason: Kafka PKI upgrade
  • 07:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2001.codfw.wmnet with reason: Kafka PKI upgrade
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 T317508', diff saved to https://phabricator.wikimedia.org/P34456 and previous config saved to /var/cache/conftool/dbconfig/20220912-075739-root.json
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317508
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317508
  • 07:47 hashar: Upgraded Jenkins instances from 2.346.1 to 2.346.3 # T317418
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 T317507', diff saved to https://phabricator.wikimedia.org/P34455 and previous config saved to /var/cache/conftool/dbconfig/20220912-074258-root.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 primary and set section read-write T317507', diff saved to https://phabricator.wikimedia.org/P34454 and previous config saved to /var/cache/conftool/dbconfig/20220912-074100-root.json
  • 07:39 marostegui: Starting es4 codfw failover from es2021 to es2020 - T317507
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 T317507', diff saved to https://phabricator.wikimedia.org/P34453 and previous config saved to /var/cache/conftool/dbconfig/20220912-073408-root.json
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317507
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317507
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T312863)', diff saved to https://phabricator.wikimedia.org/P34452 and previous config saved to /var/cache/conftool/dbconfig/20220912-072931-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T312863)', diff saved to https://phabricator.wikimedia.org/P34450 and previous config saved to /var/cache/conftool/dbconfig/20220912-072909-ladsgroup.json
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34449 and previous config saved to /var/cache/conftool/dbconfig/20220912-071829-root.json
  • 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34448 and previous config saved to /var/cache/conftool/dbconfig/20220912-071700-ladsgroup.json
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P34447 and previous config saved to /var/cache/conftool/dbconfig/20220912-071403-ladsgroup.json
  • 07:11 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old templatelinks fields everywhere (T312865) (duration: 06m 57s)
  • 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:04 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Stop writing to the old templatelinks fields everywhere (T312865) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:04 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old templatelinks fields everywhere (T312865)
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34446 and previous config saved to /var/cache/conftool/dbconfig/20220912-070324-root.json
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P34445 and previous config saved to /var/cache/conftool/dbconfig/20220912-065856-ladsgroup.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34444 and previous config saved to /var/cache/