You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(DNS update - adding language "azb" to langlist (mutante))
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json)
 
Line 1: Line 1:
== 2015-07-20 ==
== 2022-09-28 ==
* 00:02 mutante: DNS update - adding language "azb" to langlist
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
* 01:18 ejegg: updated fundraising python tools from {{Gerrit|b65109af}} to {{Gerrit|dd494413}}
* 00:34 eileen: civicrm upgraded from {{Gerrit|118c1d0b}} to {{Gerrit|916a8b08}}
* 00:11 eileen: civicrm upgraded from {{Gerrit|e198fb4c}} to {{Gerrit|118c1d0b}}


== 2015-07-19 ==
== 2022-09-27 ==
* 20:52 logmsgbot: krenair Synchronized w/static/images/project-logos/arbcom_enwiki.png: https://gerrit.wikimedia.org/r/#/c/225822/ (duration: 00m 12s)
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 19:10 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ic0573f26: Follow-up for I189d748: whitelist 'archive.org' too (duration: 00m 12s)
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 19:06 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I189d748a: Whitelist *.archive.org for wgCopyUploadsDomains (T106293) (duration: 00m 13s)
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 18:29 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Enable IP user page creation on fawiki's Draft ns (duration: 00m 11s)
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 18:18 logmsgbot: ori Synchronized php-1.26wmf14/includes/site/SiteSQLStore.php: I0e5f2d3b2: Use CACHE_ACCEL for SiteLists if on HHVM (duration: 00m 12s)
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 17:37 logmsgbot: ori Synchronized wmf-config: Ib508a440: Undeploy VectorBeta (Task: T87489) (duration: 00m 13s)
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 17:27 logmsgbot: krenair Synchronized w/static/images/project-logos/arbcom_enwiki.png: https://gerrit.wikimedia.org/r/#/c/225718/ (duration: 00m 12s)
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 17:21 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/225705/ (duration: 00m 12s)
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 17:14 logmsgbot: krenair Synchronized w/static/images/project-logos/arbcom_enwiki.png: https://gerrit.wikimedia.org/r/#/c/225705/ (duration: 00m 12s)
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 05:10 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 19 05:10:10 UTC 2015 (duration 10m 9s)
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-19 02:27:35+00:00
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 04s)
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 19 02:07:15 UTC 2015 (duration 7m 14s)
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-19 02:03:05+00:00
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:12 TheresNoTime: closing UTC late backport window
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 20:59 TheresNoTime: extending UTC late backport window
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2015-07-18 ==
== 2022-09-26 ==
* 20:58 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: labs only (duration: 00m 12s)
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 20:44 YuviPanda: restarted etherpad
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 18:56 akosiaris: reinstall labsdb1004
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 16:36 paravoid: Ganglia is up :)
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 16:09 Krenair: Ganglia seems down
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 15:42 Krenair: Doing T44180
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 05:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 05:28:25 UTC 2015 (duration 28m 24s)
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 02:34 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-18 02:34:29+00:00
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 19s)
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 02:07:38 UTC 2015 (duration 7m 37s)
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-18 02:03:29+00:00
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 00:49 ejegg: restored recurring globalcollect batch size of 250
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 00:09 ejegg: updated civicrm from 78de1b9b74934984af3099afe9192fa53011bdaa to 292ad137f6b3ffc818a3bd617ca4f335931091f3
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:31 TheresNoTime: closing UTC late backport window
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2015-07-17 ==
== 2022-09-25 ==
* 21:51 ejegg: updated civicrm from 0acac037ce0c9a64e94a475463deb2d47e84193a to 78de1b9b74934984af3099afe9192fa53011bdaa
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 20:53 matt_flaschen: Manually fixed issue in mediawikiwiki LQT thread table with rename of Ecliptica to Entropy. https://phabricator.wikimedia.org/T106122#1461380
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 20:03 hashar: stopping Zuul to get rid of a faulty registered function "build:Global-Dev Dashboard Data". Job is gone already.
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 17:50 ejegg: updated civicrm from fa724dd2e2e69545d81015c943cb7f52cf6de8e1 to 0acac037ce0c9a64e94a475463deb2d47e84193a
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 16:49 gwicke: restarted restbase on restbase1001
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 15:04 gwicke: restarted RB thinner scripts, see https://phabricator.wikimedia.org/T105706
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 14:10 urandom: restart restbase service on restbase1006
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 14:07 urandom: restart restbase service on restbase1003
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 14:05 urandom: restart restbase service on restbase1002
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 13:56 godog: apache2ctl graceful on fluorine antimony argon caesium helium
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 13:43 godog: apache2ctl graceful on netmon1001
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 11:24 hashar: rebooted labnodepool1001.eqiad.wmnet . Accidentally deleted the whole /dev which freeze everything :(
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 10:21 _joe_: repooling mw1158
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 09:08 _joe_: depooling mw1158, repooling mw1156,7
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 07:51 _joe_: depooled mw1156,7 for reimaging
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 04:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 04:53:56 UTC 2015 (duration 53m 55s)
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1030 (duration: 00m 12s)
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 02:30 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-17 02:30:03+00:00
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 02:26 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 05m 55s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 02:07:22 UTC 2015 (duration 7m 20s)
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-17 02:03:12+00:00
* 01:30 mutante: git pull origin on strontium


== 2015-07-16 ==
== 2022-09-23 ==
* 21:27 ori: bounced nutcracker on mw1139 as well. hashar noticed flood of errors from these hosts on https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-errors . lack of monitoring / alerts is troubling.
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 21:26 ori: bounced nutcracker on mw1128 and mw1134
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 20:50 mutante: iegreview tool - short maintenance downtime
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 19:39 YuviPanda: imported aspell-id from ubuntu to jessie-wikimedia - needed by ores, simple package that I am not sure why it is not in jessie
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 19:20 logmsgbot: twentyafterfour Synchronized php-1.26wmf14/includes/db/LoadMonitor.php: Deploying Hotfix for T105373 (duration: 00m 13s)
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 18:40 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf14
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:26 ejegg: changed batch size from 250 to 1 in RGC jenkins job
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:22 ejegg: updated civicrm from 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7 to fa724dd2e2e69545d81015c943cb7f52cf6de8e1
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:56 Jeff_Green: authdns update to rename lutetium.wm.o
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:08 hashar_: kept nodepool stopped on labnodepool1001.eqiad.wmnet because it spams the cron log
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:57 logmsgbot: demon Synchronized multiversion/MWMultiVersion.php: prod no-op, beta change (duration: 00m 13s)
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 15:54 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/224975/ (duration: 00m 12s)
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 15:27 logmsgbot: thcipriani Synchronized php-1.26wmf14/extensions/Math/MathMathML.php: SWAT: Fix: Undefined variable passed hook [[gerrit:225058]] (duration: 00m 12s)
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 15:03 ejegg: updated payments from 4ca95d55a9745c05ccfbb16ee6f23a6f75328824 to ebb1a9e52172a4793cf5feb33220b4d7edfcad70
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 12:21 dcausse: es1.6 upgrade: all done
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 11:32 dcausse: restarted gmond on elastic1024
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 11:06 mobrovac: citoid deploying ff90869
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 10:56 dcausse: es1.6 upgrade: upgrade elastic1031
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 10:25 mobrovac: citoid rolled back to ffbaf6d
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 10:10 mobrovac: citoid deploying 5aeb0fc
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 10:05 dcausse: es1.6 upgrade: upgrade elastic1030
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 09:38 dcausse: es1.6 upgrade: upgrade elastic1029
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 08:42 dcausse: es1.6 upgrade: upgrade elastic1028
* 07:31 dcausse: es1.6 upgrade: upgrade elastic1027
* 07:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 07:22:49 UTC 2015 (duration 22m 48s)
* 05:53 dcausse: es1.6 upgrade: upgrade elastic1026
* 05:31 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 05:24 logmsgbot: krenair Synchronized php-1.26wmf14/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225008/ (duration: 00m 13s)
* 04:38 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225006/ (duration: 00m 13s)
* 03:54 manybubbles: es1.6 upgrade: upgrade elastic1025
* 03:19 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-16 03:19:37+00:00
* 03:13 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 10m 23s)
* 02:46 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-16 02:46:03+00:00
* 02:43 manybubbles: es1.6 upgrade: upgrade elastic1024
* 02:39 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 10m 50s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 02:07:55 UTC 2015 (duration 7m 54s)
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-16 02:03:31+00:00
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-16 02:03:30+00:00
* 01:41 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/214981/ (duration: 00m 12s)
* 01:22 manybubbles: es1.6 upgrade: upgrade elastic1023


== 2015-07-15 ==
== 2022-09-22 ==
* 23:36 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221885/ (duration: 00m 13s)
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:22 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209840/ (duration: 00m 12s)
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/194075/ (duration: 00m 12s)
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:10 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224799/ (duration: 00m 13s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:09 logmsgbot: krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 13s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 12s)
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:23 csteipp: deploy patch for T105305 to wmf13/14
* 21:23 dancy@deploy1002: backport aborted:  (duration: 00m 05s)
* 22:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223843/ (duration: 00m 12s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222584/ (duration: 00m 13s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:54 manybubbles: es1.6 upgrade: upgrade elastic1022
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:37 manybubbles: es1.6 upgrade: upgrade elastic1021
* 20:55 brennen: end of utc late backport & config window
* 21:09 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Really Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef this time (duration: 01m 32s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 bblack: restarted salt-master service on palladium
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 20:33 bblack: globally cleaning up dangling symlinks left in /etc/certs from before Id7d2447 via salted 'find /etc/ssl/certs -type l -xtype l|xargs rm'
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 20:30 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef (revert Count API module instantiations and Hook runs) (duration: 01m 48s)
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 20:20 manybubbles: es1.6 upgrade: upgrade elastic1020
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:18 RoanKattouw: Running FlowCreateMentionTemplate.php on all Flow wikis
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 20:06 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf14
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 19:50 ejegg: updated civicrm from e29cc5f20b5069afcaff794e628596c1f70d69a3 to 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224408/ (duration: 00m 12s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 13s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:00 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 12s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:40 ejegg: updated civicrm from f4219bc8eca5e4db633da07b6ac9e2505cfbae16 to e29cc5f20b5069afcaff794e628596c1f70d69a3
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 logmsgbot: krenair Synchronized wmf-config/throttle.php: throttle labswiki account creations from hackathon at 500 (duration: 00m 12s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:39 logmsgbot: twentyafterfour Finished scap: group0 to 1.26wmf14 (duration: 32m 34s)
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 18:21 manybubbles: es1.6 upgrade: upgrading elastic1019
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:20 Jeff_Green: authdns-update shifting to service-oriented hostnames for fundraising cluster
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 18:06 logmsgbot: twentyafterfour Started scap: group0 to 1.26wmf14
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:55 ejegg: updated civicrm from 6560cefa8d7e68e35e30b310d6691ab57798a4c9 to f4219bc8eca5e4db633da07b6ac9e2505cfbae16
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:34 Jeff_Green: authdns-update to remove boron.wm.o
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: partially revert https://gerrit.wikimedia.org/r/#/c/224420/1/wmf-config/CommonSettings.php - doesnt quite work (duration: 00m 13s)
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:17 Jeff_Green: authdns-update to remove aluminium, also lanthanum by preexisting commit
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:45 andrewbogott: rebooting labvirt1005
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 16:43 mutante: accepting unaccepted salt keys for ganeti VMs ,planet, bromine, krypton
* 18:38 dancy@deploy1002: Started scap: testing
* 16:39 mutante: krypton - signing puppet cert, initial run
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 16:26 andrewbogott: woo, first try!
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 16:23 andrewbogott: trying to kill labvirt1005 via repeated instance suspend/resume
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 16:04 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 16:03 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 16:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224808/ (duration: 00m 12s)
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 15:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222581/ (duration: 00m 11s)
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 15:35 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 11s)
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 15:29 logmsgbot: krenair Synchronized docroot/noc/createTxtFileSymlinks.sh: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:20 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 11s)
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:33 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:22 legoktm: sync failed on mw1090.eqiad.wmnet, read only filesystem
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:20 logmsgbot: legoktm Synchronized php-1.26wmf13/extensions/CentralAuth/includes/CentralAuthPlugin.php: Add log entry for $wgCentralAuthStrict failures if SULMigration is enabled (duration: 00m 13s)
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:55 dcausse: es1.6 upgrade: upgrade elastic1018
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:24 springle: entry below not mw1216 fault, but r/o filesystem error on mw1090
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 springle: sync-common on mw1216 after sync-file from tin failed non-zero exit status 12
* 16:39 dancy@deploy1002: Sync cancelled.
* 13:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1022 T105879 (duration: 00m 12s)
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 11:43 dcausse: es1.6 upgrade: upgrade elastic1017
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 08:27 dcausse: es1.6 upgrade: upgrade elastic1016
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:31 dcausse: es1.6 upgrade: upgrade elastic1015
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:40 dcausse: es1.6 upgrade: upgrade elastic1014
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:10 springle: db1030 busy removing table partitioning
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:28 manybubbles: es1.6 upgrade: lowered the shard transfer settings back to our normal rate. going to bed.
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:12 manybubbles: es1.6 upgrade: upgrade elastic1013
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:49 springle: upgrade db1030 trusty
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:29 manybubbles: es1.6 upgrade: upgrade elastic1012
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:14 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-15 03:14:21+00:00
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 03:10 logmsgbot: reedy Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 13m 32s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:03 manybubbles: es1.6 upgrade: raised limits on shard migration rate - should speed up the restart. we should lower it before we do restarts during europe's morning
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:10 Reedy: Running LU manually to see what's wrong with it
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 15 02:07:48 UTC 2015 (duration 7m 47s)
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-15 02:02:55+00:00
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2015-07-14 ==
== 2022-09-21 ==
* 23:46 manybubbles: es1.6 upgrade: upgraded elastic1011
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:22 bblack: updating nginx to 1.9.3-1+wmf1 on cp*
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:17 bblack: reprepro: nginx for jessie-wikimedia/main bumped to 1.9.3-1+wmf1
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:22 ejegg: updated civicrm from 04efc7d5c7bbb068f907125f2184692aee676123 to 6560cefa8d7e68e35e30b310d6691ab57798a4c9
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:29 Reedy: mw1090 fs is ro
* 20:46 tgr_: UTC late deploys done
* 21:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Fix testwiki
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:05 _joe|AFK: depooling mw1090, ext4 errors in syslog, filesystem mounted read-only
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 21:01 logmsgbot: twentyafterfour Synchronized wmf-config/CommonSettings.php: revert LCStoreStaticArray (duration: 00m 12s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:59 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf14 and rebuild localization cache (duration: 72m 45s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:42 bblack: undoing LCStoreStaticArray because appservers look unhealthy, using ori's command: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"'
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:46 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf14 and rebuild localization cache
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 19:23 manybubbles: es1.6 step iforget: upgrade elasticsearch on elastic1010
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 mutante: terbium:   /usr/local/bin/foreachwiki extensions/Echo/maintenance/processEchoEmailBatch.php
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:10 dcausse: es1.6 step 10: upgrade elastic1009
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:23 mutante: bromine - apt-get upgrade
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:08 logmsgbot: manybubbles Synchronized php-1.26wmf13/extensions/UniversalLanguageSelector/: SWAT add some hooks to extension.json (duration: 00m 13s)
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 14:34 gwicke: started RESTBase revision thin-out script for html and data-parsoid on wikimedia domains
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 dcausse: es1.6 step 9: upgrade elastic1008
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:48 _joe_: reimaging mw1155
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:17 ori: Logging a message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log.
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 11:28 dcausse: es1.6 step 8: upgrade elastic1007
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 11:25 _joe_: repooling mw1154 with HHVM
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:12 _joe_: stopped poolcounter on mw1154
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 10:06 _joe_: reimaging mw1154
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:49 dcausse: es1.6 step 7: upgrade elastic1006
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 07:09:10 UTC 2015 (duration 9m 9s)
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 06:48 dcausse: es1.6 step 6: upgrade elastic1005
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:41 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I9c9bf0f4: Use LCStoreStaticArray unconditionally (duration: 03m 02s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:26 ori: Cleaned up now-unused hhbc files from /run/hhvm/cache on job runners
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:58 ori: Enabling LCStoreStaticArray in production. May be reverted by running: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"' on palladium.
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 04:48 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Follow-up for Ieb62ee050e: allow LCStoreStaticArray in server mode (duration: 00m 13s)
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-14 02:35:21+00:00
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 07m 27s)
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 02:07:32 UTC 2015 (duration 7m 30s)
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-14 02:02:33+00:00
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul


== 2015-07-13 ==
== 2022-09-20 ==
* 23:22 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/VisualEditor: SWAT (duration: 00m 11s)
* 20:19 cjming: end of UTC late backport window
* 23:11 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Add title to Parsoid exception logging (duration: 00m 12s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:45 logmsgbot: legoktm Synchronized wmf-config: Revert "Set $wgCentralAuthStrict = true;" (duration: 00m 13s)
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 22:41 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 13s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:41 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:16 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/User.php: Add 'AuthPluginStrict' log to identify users who are unable to authenticate (duration: 00m 13s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 12s)
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/Hooks.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 13s)
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 22:13 ejegg: updated payments from ec34ebf61e5962f66b807abdcb519ff323d41e8e to 4ca95d55a9745c05ccfbb16ee6f23a6f75328824
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 22:00 manybubbles: es1.6 step 4: upgrade elastic1003
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 21:54 ori: Debugging metric issue on graphite1001, brief stats drop possible
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 21:32 legoktm: renaming ~3k users who were originally missed for SULF
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/Hooks.php: (no message) (duration: 00m 12s)
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: (no message) (duration: 00m 13s)
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 20:42 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: f9c89d2814: Revert "Revert Count API module instantiations and Hook runs" (duration: 00m 13s)
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 20:30 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ieb62ee05: Temporary hack to facilitate migration of l10n cache implementations (duration: 00m 11s)
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:42 hoo: Updated Wikidata's property suggester with data from today's json dump
* 18:50 jynus: restart db2100:s7 to apply new config
* 19:24 manybubbles_: es1.6 step 3: upgrade elastic1002
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 19:08 legoktm: running populateContentModel.php --table=page on all small wikis
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:01 andrewbogott: two of two
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:01 mutante: morebots - are you 1.7.11 ?
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 19:01 andrewbogott: one of two
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 18:52 legoktm: running populateContentModel.php --table=page on testwiki
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 18:29 manybubbles_: es1.6 step 2: shut down extra instance of elasticsearch on elastic1021
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 17:39 andrewbogott: this is the second test log of three
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 17:39 andrewbogott: this is the first test log of three
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 17:36 mutante: included adminbot_1.7.11 in APT repo
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:31 andrewbogott: wikidata-dev updated local puppet and rebooting property-suggester
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:08 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:07 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:11 manybubbles_: all done SWATing.
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 15:09 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable footer contact link on ukwiki (duration: 00m 11s)
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 14:55 manybubbles_: after upgrading elasticsearch its init script no longer shuts down the old version of elasticsearch. so you have to manually kill it. that means the upgrade instructions will be "special" this time around. hopefully this is a one time thing.
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 14:45 manybubbles_: es1.6 step 1: upgrade elasticsearch on elastic1001 -starting
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 14:45 manybubbles_: es1.6 step 0: successfully synced new versions of plugins
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 14:30 manybubbles_: es1.6 step 0: sync new versions of plugins
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 14:30 manybubbles_: starting the elasticsearch 1.6.0 upgrade
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 13:13 bblack: updating nginx/bind on cp*
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 13:07 bblack: updating openssl on cp*
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 13:02 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/Cite/extension.json: https://gerrit.wikimedia.org/r/#/c/224407/ - unbreak VE mobile, https://phabricator.wikimedia.org/T105686 (duration: 00m 12s)
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 10:58 mobrovac: restbase deploying 6dec79d
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 10:22 logmsgbot: ori Synchronized php-1.26wmf13/maintenance/rebuildLocalisationCache.php: 117f60a171: rebuildLocalisationCache: don't limit memory usage (duration: 00m 12s)
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 08:52 godog: bounce graphite-web on graphite1001
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 08:51 godog: bounce carbon daemons on graphite1001
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 08:50 godog: upgrade graphite to 0.9.13 on graphite1001 and bounce one instance of carbon/cache
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 07:29 logmsgbot: ori Synchronized php-1.26wmf13/includes/cache/LCStoreStaticArray.php: I3f63594a4: Fix variable name (follows Ib2c5856d) (duration: 00m 11s)
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 06:25 logmsgbot: LocalisationUpdate failed: git pull of core failed
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 06:24 ori: Experimenting with altering the localisation cache implementation for testwiki, operations/mediawiki-config on tin will have a local hack for a little bit
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 05:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 05:07:32 UTC 2015 (duration 7m 31s)
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 02:25:58 UTC 2015 (duration 25m 57s)
* 16:42 dancy@deploy1002: Sync cancelled.
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:23:43+00:00
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 16s)
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:10:25+00:00
* 16:09 awight@deploy1002: backport aborted:  (duration: 00m 33s)
* 02:10 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 01:47 springle: restarted labsdb1002 mysqld while troubleshooting replication
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2015-07-12 ==
== 2022-09-19 ==
* 14:59 bblack: upgraded most packages on sodium
* 22:59 ebernhardson: [[phab:T317200|T317200]] start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
* 14:48 bblack: upgraded apache2 to 2.2.22-1ubuntu1.9 on: antimony argon caesium fluorine helium iodine logstash1001 logstash1003 magnesium neon netmon1001 rhodium stat1001 ytterbium
* 21:21 maryum: Deployed security patch for [[phab:T302479|T302479]]
* 04:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 04:49:08 UTC 2015 (duration 49m 7s)
* 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:26:52+00:00
* 21:15 sbassett: Deployed security patch for [[phab:T312820|T312820]]
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 02:25:33 UTC 2015 (duration 25m 32s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 12s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:10:00+00:00
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 cjming: end of UTC late backport window
* 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031{{!}}Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459{{!}}Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s)
* 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877{{!}}ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s)
* 20:12 cjming@deploy1002: Finished scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s)
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cjming@deploy1002: cjming and arlolra: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:06 cjming@deploy1002: Started scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]]
* 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
* 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
* 17:36 dancy@deploy1002: Sync cancelled.
* 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:36 dancy@deploy1002: Started scap: testing, disregard
* 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage<nowiki>{</nowiki>.png,-1.5x.png,-2x.png<nowiki>}</nowiki> ([[phab:T317718|T317718]])
* 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|6c7151d969b6997bd9cce042b7bc78c282dd9b26}}: Regenerate ukwikivoyage logo ([[phab:T317718|T317718]]) (duration: 03m 46s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbf161d148228e0e706813f923ab1a5d4b42757a}}: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro ([[phab:T314518|T314518]]) (duration: 04m 01s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a6c1ddf5cd1a46ab05f5d6fda4b938a3ee37238}}: Remove unnecessary wgNamespaceAliases from bnwiki ([[phab:T318003|T318003]]) (duration: 04m 16s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2015-07-11 ==
== 2022-09-17 ==
* 19:48 jynus: stopping labsdb1002 after table corruption has been detected
* 12:17 Emperor: set thanos ring replicas to 3.80 [[phab:T311690|T311690]]
* 19:37 urandom: from restbase1002, starting revision culling process (node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | tee >(gzip -c > local_group_wikimedia_T_parsoid_html.log.`date +%s`.gz))
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34879 and previous config saved to /var/cache/conftool/dbconfig/20220917-103903-ladsgroup.json
* 19:33 urandom: restbase: setting gc_grace_seconds to 604800 (1 week) on local_group_wikipedia_T_parsoid_html.data
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34878 and previous config saved to /var/cache/conftool/dbconfig/20220917-102356-ladsgroup.json
* 04:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 04:55:56 UTC 2015 (duration 55m 55s)
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34877 and previous config saved to /var/cache/conftool/dbconfig/20220917-100850-ladsgroup.json
* 04:21 bd808: Logstash cluster upgrade complete! Kibana working again
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34876 and previous config saved to /var/cache/conftool/dbconfig/20220917-095344-ladsgroup.json
* 04:21 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1006
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34875 and previous config saved to /var/cache/conftool/dbconfig/20220917-094856-ladsgroup.json
* 04:12 bd808: rebooting logstash1006
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34874 and previous config saved to /var/cache/conftool/dbconfig/20220917-093349-ladsgroup.json
* 04:06 bd808: logstash1005 fully recovered all shards
* 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34873 and previous config saved to /var/cache/conftool/dbconfig/20220917-091843-ladsgroup.json
* 03:21 logmsgbot: mattflaschen Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Bump Flow to encode page name when sending to Parsoid (duration: 00m 13s)
* 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34872 and previous config saved to /var/cache/conftool/dbconfig/20220917-090336-ladsgroup.json
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:28:18+00:00
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34871 and previous config saved to /var/cache/conftool/dbconfig/20220917-074806-ladsgroup.json
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 07s)
* 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34870 and previous config saved to /var/cache/conftool/dbconfig/20220917-073300-ladsgroup.json
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 02:25:19 UTC 2015 (duration 25m 18s)
* 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34869 and previous config saved to /var/cache/conftool/dbconfig/20220917-071753-ladsgroup.json
* 02:09 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:09:45+00:00
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34868 and previous config saved to /var/cache/conftool/dbconfig/20220917-070247-ladsgroup.json
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 35s)
* 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34867 and previous config saved to /var/cache/conftool/dbconfig/20220917-051719-ladsgroup.json
* 00:46 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1005; replicas recovering now
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:34 bd808: rebooting logstash1005
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:30 bd808: logstash1004 fully recovered all shards
* 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34866 and previous config saved to /var/cache/conftool/dbconfig/20220917-051527-ladsgroup.json
* 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34865 and previous config saved to /var/cache/conftool/dbconfig/20220917-051203-ladsgroup.json
* 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance


== 2015-07-10 ==
== 2022-09-16 ==
* 22:51 mutante: tendril: very short maintenance downtime
* 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:10 bd808: `service elasticsearch start` not starting on logstash1004; investigating
* 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:07 bd808: ran apt-get upgrade on logstash1004
* 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34864 and previous config saved to /var/cache/conftool/dbconfig/20220916-212905-ladsgroup.json
* 19:52 mutante: adminbot - built and imported 1.7.10 into APT repo
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34863 and previous config saved to /var/cache/conftool/dbconfig/20220916-211358-ladsgroup.json
* 19:43 bd808: rebooting logstash1004
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34862 and previous config saved to /var/cache/conftool/dbconfig/20220916-205852-ladsgroup.json
* 19:40 bd808: Kibana seems to be broken by mixed 1.6.0/1.3.9 cluster
* 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34861 and previous config saved to /var/cache/conftool/dbconfig/20220916-204345-ladsgroup.json
* 19:32 bd808: kibana not seeing indices after upgrading elasticsearch to 1.6.0; investigating
* 19:16 mutante: cp1081 /usr/local/sbin/update-ocsp-all
* 19:26 bd808: Upgraded logstash1003 to elasticsearch 1.6.0
* 17:01 mutante: gitlab-runner*: deployed gerrit:832584 and systemctl restart buildkitd on 6 hosts for [[phab:T317904|T317904]]
* 19:22 bd808: Upgraded logstash1002 to elasticsearch 1.6.0
* 16:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 19:19 bd808: Upgraded logstash1001 to elasticsearch 1.6.0
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 19:10 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableNode.js: https://gerrit.wikimedia.org/r/#/c/224122/ (duration: 00m 12s)
* 16:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 18:11 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 120'
* 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 18:00 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 90'
* 16:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 17:49 gwicke: rolling restart of the cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/224114/
* 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 17:32 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: prevent race condition on writing settings (duration: 00m 13s)
* 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:26 moritzm: installed python security updates on mc*
* 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:25 Coren: rebooting labstore2001 (experiments with the new raid setup caused the mapper table to fill)
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
* 16:35 mobrovac: restbase deploying hotfix for T105509
* 16:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2184
* 15:29 mobrovac: restbase restarted restabse on restbase1004
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2183
* 15:25 godog: bounce cassandra on restbae1004
* 16:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2183
* 13:43 godog: bounce cassandra on restbae1004
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34860 and previous config saved to /var/cache/conftool/dbconfig/20220916-161409-ladsgroup.json
* 13:37 _joe_: temporarily repooled mw1031
* 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 12:40 godog: bounce cassandra on restbae1004
* 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 07:43 godog: reimage ms-be2013 T105213
* 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34859 and previous config saved to /var/cache/conftool/dbconfig/20220916-161346-ladsgroup.json
* 04:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 10 04:36:49 UTC 2015 (duration 36m 48s)
* 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34858 and previous config saved to /var/cache/conftool/dbconfig/20220916-155840-ladsgroup.json
* 04:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037; repool db1030 (revert below) (duration: 00m 12s)
* 15:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 04:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 15:52 dancy@deploy1002: Installation of scap version "4.20.0" completed for 561 hosts
* 03:14 mutante: re-enabling puppet on tools-exec-1213, working around adminbot package install fail
* 15:51 dancy@deploy1002: Installing scap version "4.20.0" for 561 hosts
* 02:59 elee: please log this with the year
* 15:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 02:53 andrewbogott: testing the log by logging a test
* 15:44 dancy@deploy1002: Finished scap: testing (duration: 04m 53s)
* 01:50 gwicke: bounced cassandra on restbase1004
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34857 and previous config saved to /var/cache/conftool/dbconfig/20220916-154333-ladsgroup.json
* 01:38 jgage: cassandra restarted on restbase1004
* 15:39 dancy@deploy1002: Started scap: testing
* 00:39 urandom: starting restbase1004
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34856 and previous config saved to /var/cache/conftool/dbconfig/20220916-152827-ladsgroup.json
* 00:35 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/modules/ve-mw/ui/inspectors/ve.ui.MWLinkAnnotationInspector.js: https://gerrit.wikimedia.org/r/#/c/223983/ (duration: 00m 12s)
* 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 00:15 hoo: Updated WikibaseQualityConstraints data on wikidata (wikidatawiki.wbqc_constraints)
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:01 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:01 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:42 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:17 godog: add 100G to prometheus/eqiad instance k8s-mlserve
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34855 and previous config saved to /var/cache/conftool/dbconfig/20220916-131902-root.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34854 and previous config saved to /var/cache/conftool/dbconfig/20220916-130357-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34853 and previous config saved to /var/cache/conftool/dbconfig/20220916-125841-root.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34852 and previous config saved to /var/cache/conftool/dbconfig/20220916-124850-root.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34851 and previous config saved to /var/cache/conftool/dbconfig/20220916-124336-root.json
* 12:43 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34850 and previous config saved to /var/cache/conftool/dbconfig/20220916-123346-root.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34849 and previous config saved to /var/cache/conftool/dbconfig/20220916-122831-root.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34848 and previous config saved to /var/cache/conftool/dbconfig/20220916-121841-root.json
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34847 and previous config saved to /var/cache/conftool/dbconfig/20220916-121326-root.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34846 and previous config saved to /var/cache/conftool/dbconfig/20220916-120336-root.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34845 and previous config saved to /var/cache/conftool/dbconfig/20220916-115821-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34844 and previous config saved to /var/cache/conftool/dbconfig/20220916-114935-root.json
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34843 and previous config saved to /var/cache/conftool/dbconfig/20220916-114831-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34842 and previous config saved to /var/cache/conftool/dbconfig/20220916-114316-root.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P34841 and previous config saved to /var/cache/conftool/dbconfig/20220916-113543-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34840 and previous config saved to /var/cache/conftool/dbconfig/20220916-113431-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34839 and previous config saved to /var/cache/conftool/dbconfig/20220916-113325-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P34838 and previous config saved to /var/cache/conftool/dbconfig/20220916-112750-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34837 and previous config saved to /var/cache/conftool/dbconfig/20220916-111925-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34836 and previous config saved to /var/cache/conftool/dbconfig/20220916-110420-root.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34835 and previous config saved to /var/cache/conftool/dbconfig/20220916-105819-ladsgroup.json
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34834 and previous config saved to /var/cache/conftool/dbconfig/20220916-105809-ladsgroup.json
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34832 and previous config saved to /var/cache/conftool/dbconfig/20220916-104916-root.json
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34831 and previous config saved to /var/cache/conftool/dbconfig/20220916-104303-ladsgroup.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34830 and previous config saved to /var/cache/conftool/dbconfig/20220916-103411-root.json
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34829 and previous config saved to /var/cache/conftool/dbconfig/20220916-102756-ladsgroup.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34828 and previous config saved to /var/cache/conftool/dbconfig/20220916-101905-root.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34827 and previous config saved to /var/cache/conftool/dbconfig/20220916-101250-ladsgroup.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34826 and previous config saved to /var/cache/conftool/dbconfig/20220916-100400-root.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34825 and previous config saved to /var/cache/conftool/dbconfig/20220916-093635-root.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34824 and previous config saved to /var/cache/conftool/dbconfig/20220916-093121-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34823 and previous config saved to /var/cache/conftool/dbconfig/20220916-092130-root.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34822 and previous config saved to /var/cache/conftool/dbconfig/20220916-091616-root.json
* 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34821 and previous config saved to /var/cache/conftool/dbconfig/20220916-091234-ladsgroup.json
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34820 and previous config saved to /var/cache/conftool/dbconfig/20220916-090625-root.json
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34819 and previous config saved to /var/cache/conftool/dbconfig/20220916-090111-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34818 and previous config saved to /var/cache/conftool/dbconfig/20220916-085120-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34817 and previous config saved to /var/cache/conftool/dbconfig/20220916-084607-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34816 and previous config saved to /var/cache/conftool/dbconfig/20220916-083615-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34815 and previous config saved to /var/cache/conftool/dbconfig/20220916-083102-root.json
* 08:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34814 and previous config saved to /var/cache/conftool/dbconfig/20220916-082110-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34813 and previous config saved to /var/cache/conftool/dbconfig/20220916-081557-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34812 and previous config saved to /var/cache/conftool/dbconfig/20220916-080605-root.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34811 and previous config saved to /var/cache/conftool/dbconfig/20220916-080052-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34810 and previous config saved to /var/cache/conftool/dbconfig/20220916-075100-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34809 and previous config saved to /var/cache/conftool/dbconfig/20220916-074548-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34808 and previous config saved to /var/cache/conftool/dbconfig/20220916-074251-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P34807 and previous config saved to /var/cache/conftool/dbconfig/20220916-072958-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34806 and previous config saved to /var/cache/conftool/dbconfig/20220916-072746-root.json
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34805 and previous config saved to /var/cache/conftool/dbconfig/20220916-071241-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34804 and previous config saved to /var/cache/conftool/dbconfig/20220916-065737-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34803 and previous config saved to /var/cache/conftool/dbconfig/20220916-064232-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34802 and previous config saved to /var/cache/conftool/dbconfig/20220916-062727-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34801 and previous config saved to /var/cache/conftool/dbconfig/20220916-061222-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34800 and previous config saved to /var/cache/conftool/dbconfig/20220916-055717-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34799 and previous config saved to /var/cache/conftool/dbconfig/20220916-055542-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34798 and previous config saved to /var/cache/conftool/dbconfig/20220916-055424-root.json
* 05:51 marostegui: Install 10.6 on db1168 [[phab:T301879|T301879]]
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34797 and previous config saved to /var/cache/conftool/dbconfig/20220916-055031-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P34795 and previous config saved to /var/cache/conftool/dbconfig/20220916-054438-root.json
* 01:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 01:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 01:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 01:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 00:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 17s)
* 00:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)


== July 9 ==
== 2022-09-15 ==
* 23:41 legoktm: deployed patch for T105413
* 23:51 mutante: gerrit1001 - disabled puppet - gerrit:832411
* 23:07 gwicke: bounced cassandra on restbase1004
* 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 23:02 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: TitleBlacklist: Don't block account auto-creation (duration: 00m 13s)
* 22:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 22:09 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-eqiad.php: I don't think we want to keep poolcounter running on an imagescaler (duration: 00m 12s)
* 21:30 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 21:30 logmsgbot: tgr Synchronized php-1.26wmf13/extensions/OAuth/api/MWOAuthAPI.setup.php: no canonical redirects for requests with OAuth headers (duration: 00m 12s)
* 20:25 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] (duration: 07m 06s)
* 21:05 tgr: backporting https://gerrit.wikimedia.org/r/#/c/223952/- fixes OAuth which is broken for 1.26wmf13
* 20:18 thcipriani@deploy1002: thcipriani and dani: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:47 gwicke: temporarily disabled puppet on cassandra nodes while tweaking settings
* 20:18 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]]
* 19:53 legoktm: manually fixing global merge of Yuvipanda->YuviPanda (T104686)
* 20:15 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] (duration: 07m 39s)
* 19:04 gwicke: bounced cassandra on restbase1004
* 20:08 thcipriani@deploy1002: thcipriani and dcausse: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:29 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf13
* 20:07 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]]
* 17:54 gwicke: bounced restbase on restbase1005
* 19:26 ebernhardson: pool'd wdqs2001, some blockers before reload can start [[phab:T316236|T316236]]
* 17:32 ori: installed poolcounter on mw1154
* 18:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 17:31 logmsgbot: ori Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 12s)
* 18:39 dancy@deploy1002: Finished scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] (duration: 09m 53s)
* 17:22 cmjohnson1: shutting down helium for a few minutes to move within the same row
* 18:38 cwhite: restart thanos-compact (thanos-fe2001) and swift_ring_manager (thanos-fe1001)
* 16:53 gwicke: bounced cassandra on restbase1004
* 18:29 dancy@deploy1002: dancy and cscott: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 16:48 godog: reboot ms-be2013 T105213
* 18:29 dancy@deploy1002: Started scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]]
* 16:38 gwicke: bounced cassandra on restbase1006
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2003.codfw.wmnet on all recursors
* 16:07 _joe_: repooling mw1152
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2003.codfw.wmnet on all recursors
* 15:57 godog: restart cassandra on restbase1002
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 15:34 gwicke: bounced cassandra on restbase1004
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 15:24 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223739/ (duration: 00m 12s)
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2001.codfw.wmnet on all recursors
* 15:23 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223737/ (duration: 00m 12s)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2001.codfw.wmnet on all recursors
* 15:23 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223742/ (duration: 00m 12s)
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1003.eqiad.wmnet on all recursors
* 15:09 gwicke: bounced cassandra on restbase1004
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1003.eqiad.wmnet on all recursors
* 14:44 gwicke: re-enabled compaction throttling (60mb/s) on cassandra nodes
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1002.eqiad.wmnet on all recursors
* 14:44 bblack: reprepro: jessie-wikimedia/backports openssl pkg, 1.0.2c-1 => 1.0.2d-1~wmf1
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1002.eqiad.wmnet on all recursors
* 14:29 _joe_: reimaging mw1152 for wiping any leftover local hacks. Depooling, scheduling downtime
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1001.eqiad.wmnet on all recursors
* 14:28 moritzm: installed python-django security updates on labmon, netmon and californium
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1001.eqiad.wmnet on all recursors
* 14:24 godog: really upgrade python-django on graphite2001
* 18:15 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 13:48 mobrovac: restbase cassandra rolling restart to apply https://gerrit.wikimedia.org/r/223774
* 18:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:02 godog: upgrade python-django on graphite1001 and graphite2001 following  http://www.ubuntu.com/usn/usn-2671-1/
* 18:13 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 11:34 godog: restart cassandra on restbase1001
* 18:07 godog: restart envoyproxy on thanos-fe*
* 11:22 logmsgbot: krinkle Synchronized php-1.26wmf13/resources/src/mediawiki/mediawiki.util.js: T105265 (duration: 00m 11s)
* 18:06 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 11:21 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/GlobalFunctions.php: T105265 (duration: 00m 12s)
* 18:06 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 11:09 mobrovac: restbase deploying https://gerrit.wikimedia.org/r/#/c/223297/ which bumps the back-end module version ( https://github.com/wikimedia/restbase-mod-table-cassandra/pull/117 )
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:53 mobrovac: restbase started thinner 15 days for wikimedia group
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:37 mark: Shutdown AMS-IX route server BGP sessions on cr1-esams
* 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging BryanDavis out of all services on: 2047 hosts
* 07:48 logmsgbot: oblivian Synchronized php-1.26wmf13/thumb.php: Re-add fix for thumb.php 404s on HHVM (duration: 00m 13s)
* 16:16 andrew@cumin1001: START - Cookbook sre.idm.logout Logging BryanDavis out of all services on: 2047 hosts
* 06:27 twentyafterfour: restarted apache2 on iridium to fix phab exception
* 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:15 springle: db1037 is repartitioning tables; it will lag intermittently for a day
* 15:37 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 06:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 06:05:30 UTC 2015 (duration 5m 29s)
* 15:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 05:23 gwicke: dynamically limited cassandra compaction throughput to 80mb/s; please review https://gerrit.wikimedia.org/r/#/c/223722/ to make this permanent
* 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 03:01:13+00:00
* 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 02:58 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 05m 29s)
* 15:22 hnowlan: starting cassandra on sessionstore1001-a
* 02:42 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:42:56+00:00
* 15:18 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 02:40 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 02:40:16 UTC 2015 (duration 40m 15s)
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34792 and previous config saved to /var/cache/conftool/dbconfig/20220915-151131-ladsgroup.json
* 02:36 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 32s)
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34791 and previous config saved to /var/cache/conftool/dbconfig/20220915-145625-ladsgroup.json
* 02:28 twentyafterfour: restarted phd
* 14:41 moritzm: installing libtirpc security updates
* 02:28 twentyafterfour: moved phd log to free disk space on iridium
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34790 and previous config saved to /var/cache/conftool/dbconfig/20220915-144118-ladsgroup.json
* 02:24 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 02:24:00+00:00
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34789 and previous config saved to /var/cache/conftool/dbconfig/20220915-142612-ladsgroup.json
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 14:01 sukhe: retarting bird.service on A:dns-auth for zlib update
* 02:17 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:17:02+00:00
* 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6b9784a0708cf1e7762034ccfba7e5604b2f6dc2}}: Enable the Vue version of the mentee overview in pilot wikis ([[phab:T300532|T300532]]) (duration: 03m 45s)
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 47s)
* 13:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d] (duration: 00m 09s)
* 02:00 springle: pkg upgrade and restart db1037
* 13:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d]
* 01:49 gwicke: switched remaining cassandra nodes to JDK8
* 13:57 sukhe: retarting haproxy.service on A:dns-auth for zlib update
* 01:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 11s)
* 13:57 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d] (duration: 00m 10s)
* 01:07 mutante: uranium - deleted apache logs older than 90 days
* 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d]
* 00:45 RoanKattouw: Running populateContentModel.php --wiki=cawiki --table=revision --ns=5
* 13:51 jayme: updated rsyslog to 8.2208.0-1~bpo11+1 on all kubernetes masters and nodes - [[phab:T289766|T289766]]
* 00:20 RoanKattouw: Ran populateContentModel.php --table=revision for odd-numbered namespaces on officewiki for T105245
* 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383] (duration: 06m 01s)
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383]
* 13:38 sukhe: restarting bird.service on A:dns-rec for zlib update
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:33 sukhe: restarting pdns-recursor on A:dns-rec for zlib update
* 13:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/GrowthExperiments/: {{Gerrit|f592e85858d17a2de99cde93627054ee4972c2bd}}: Mentee overview: avoid requiring the non-vue mentee overview script when loading the Vue one ([[phab:T300532|T300532]]) (duration: 04m 05s)
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 12:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 12:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 12:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS buster
* 12:17 jayme: fleet wide update of prometheus-rsyslog-exporter to 0.0.0+git20201008-4 - [[phab:T289766|T289766]]
* 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 12:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34787 and previous config saved to /var/cache/conftool/dbconfig/20220915-120013-root.json
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 11:50 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34786 and previous config saved to /var/cache/conftool/dbconfig/20220915-114508-root.json
* 11:44 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 11:43 moritzm: restart exim on lists1001 to pick up zlib security updates
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34785 and previous config saved to /var/cache/conftool/dbconfig/20220915-113003-root.json
* 11:22 jayme: importing prometheus-rsyslog-exporter 0.0.0+git20201008-4 to stretch-wikimedia, buster-wikimedia, bullseye-wikimedia - [[phab:T289766|T289766]]
* 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 11:15 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34784 and previous config saved to /var/cache/conftool/dbconfig/20220915-111458-root.json
* 11:12 hnowlan: sessionstore1001: c-foreach-nt drain
* 11:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 11:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'pool db2129 into s6 API', diff saved to https://phabricator.wikimedia.org/P34783 and previous config saved to /var/cache/conftool/dbconfig/20220915-110453-root.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34782 and previous config saved to /var/cache/conftool/dbconfig/20220915-105953-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34781 and previous config saved to /var/cache/conftool/dbconfig/20220915-104448-root.json
* 10:36 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34780 and previous config saved to /var/cache/conftool/dbconfig/20220915-102943-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34779 and previous config saved to /var/cache/conftool/dbconfig/20220915-101438-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34778 and previous config saved to /var/cache/conftool/dbconfig/20220915-101425-root.json
* 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P34777 and previous config saved to /var/cache/conftool/dbconfig/20220915-100212-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34775 and previous config saved to /var/cache/conftool/dbconfig/20220915-095920-root.json
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34774 and previous config saved to /var/cache/conftool/dbconfig/20220915-094415-root.json
* 09:38 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383] (duration: 14m 21s)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34773 and previous config saved to /var/cache/conftool/dbconfig/20220915-092910-root.json
* 09:23 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383]
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34772 and previous config saved to /var/cache/conftool/dbconfig/20220915-091405-root.json
* 09:13 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383] (duration: 00m 08s)
* 09:13 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383]
* 09:12 aqu@deploy1002: Finished deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383] (duration: 27m 31s)
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34770 and previous config saved to /var/cache/conftool/dbconfig/20220915-085900-root.json
* 08:49 apergos: UTC backport training window closed at lsat
* 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 08:45 aqu@deploy1002: Started deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383]
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34768 and previous config saved to /var/cache/conftool/dbconfig/20220915-084355-root.json
* 08:43 aqu: about to deploy analytics/refinery
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34767 and previous config saved to /var/cache/conftool/dbconfig/20220915-084046-root.json
* 08:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34766 and previous config saved to /var/cache/conftool/dbconfig/20220915-082851-root.json
* 08:26 tsepothoabala@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832339{{!}}Enable action blocks on ptwiki (T317157)]] (duration: 04m 07s)
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34765 and previous config saved to /var/cache/conftool/dbconfig/20220915-082541-root.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34764 and previous config saved to /var/cache/conftool/dbconfig/20220915-082112-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34763 and previous config saved to /var/cache/conftool/dbconfig/20220915-081627-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 codfw master [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34762 and previous config saved to /var/cache/conftool/dbconfig/20220915-081517-marostegui.json
* 08:14 marostegui: Starting s6 codfw failover from db2129 to db2114 - [[phab:T317850|T317850]]
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34761 and previous config saved to /var/cache/conftool/dbconfig/20220915-081036-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34760 and previous config saved to /var/cache/conftool/dbconfig/20220915-080607-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 from API [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34759 and previous config saved to /var/cache/conftool/dbconfig/20220915-080157-root.json
* 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34758 and previous config saved to /var/cache/conftool/dbconfig/20220915-080122-root.json
* 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 07:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34757 and previous config saved to /var/cache/conftool/dbconfig/20220915-075531-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34756 and previous config saved to /var/cache/conftool/dbconfig/20220915-075102-root.json
* 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34755 and previous config saved to /var/cache/conftool/dbconfig/20220915-074026-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34754 and previous config saved to /var/cache/conftool/dbconfig/20220915-073557-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34753 and previous config saved to /var/cache/conftool/dbconfig/20220915-072520-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34752 and previous config saved to /var/cache/conftool/dbconfig/20220915-072053-root.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:14 moritzm: installing zlib security updates
* 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34751 and previous config saved to /var/cache/conftool/dbconfig/20220915-071015-root.json
* 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34750 and previous config saved to /var/cache/conftool/dbconfig/20220915-070548-root.json
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34749 and previous config saved to /var/cache/conftool/dbconfig/20220915-065510-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34748 and previous config saved to /var/cache/conftool/dbconfig/20220915-065043-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db2096 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34747 and previous config saved to /var/cache/conftool/dbconfig/20220915-064750-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34746 and previous config saved to /var/cache/conftool/dbconfig/20220915-064635-marostegui.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2096 to x1 primary and set section read-write [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34745 and previous config saved to /var/cache/conftool/dbconfig/20220915-064525-root.json
* 06:44 marostegui: Starting x1 codfw failover from db2115 to db2096 - [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2096 with weight 0 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34744 and previous config saved to /var/cache/conftool/dbconfig/20220915-064014-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34743 and previous config saved to /var/cache/conftool/dbconfig/20220915-063538-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34742 and previous config saved to /var/cache/conftool/dbconfig/20220915-061421-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 codfw [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34741 and previous config saved to /var/cache/conftool/dbconfig/20220915-061317-marostegui.json
* 06:12 marostegui: Starting s3 codfw failover from db2105 to db2127 - [[phab:T317839|T317839]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34740 and previous config saved to /var/cache/conftool/dbconfig/20220915-060307-root.json
* 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 05:32 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]


== July 8 ==
== 2022-09-14 ==
* 23:07 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow: SWAT (duration: 00m 14s)
* 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34739 and previous config saved to /var/cache/conftool/dbconfig/20220914-220822-ladsgroup.json
* 23:06 bd808: Restarted logstash on logstash1001; no hhvm input seen for last hour
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 22:56 gwicke: finished rolling restart of cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/223495/
* 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 22:45 mutante: zirconium - stop puppet for role switch
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 22:33 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/changes/EnhancedChangesList.php: Unbreak missing flags in enhanced RC (duration: 00m 12s)
* 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 22:08 logmsgbot: hoo Synchronized php-1.26wmf13/extensions/Wikidata/: Update Wikibase: Fix JavaScript ULS usage (duration: 00m 20s)
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34738 and previous config saved to /var/cache/conftool/dbconfig/20220914-220744-ladsgroup.json
* 21:51 logmsgbot: manybubbles Synchronized php-1.26wmf12/extensions/CirrusSearch/: Stop some fatals in cirrus (duration: 00m 13s)
* 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34737 and previous config saved to /var/cache/conftool/dbconfig/20220914-215238-ladsgroup.json
* 21:41 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert Count API module instantiations and Hook runs (2/2) (duration: 00m 12s)
* 21:38 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 01m 48s)
* 21:40 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/Hooks.php: Revert Count API module instantiations and Hook runs (1/2) (duration: 00m 12s)
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34736 and previous config saved to /var/cache/conftool/dbconfig/20220914-213732-ladsgroup.json
* 21:39 logmsgbot: bd808 Synchronized php-1.26wmf13/extensions/CirrusSearch/includes/CirrusSearch.php: Suppress interwiki results when they would break (duration: 00m 12s)
* 21:37 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
* 21:08 bblack: graphite: wiped /var/log/upstart/statsite* logs, restarted statsite processes
* 21:36 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
* 20:56 csteipp: deployed patches for T103022 & T103023
* 21:35 dduvall@deploy1002: Installation of scap version "4.19.1" completed for 561 hosts
* 20:53 csteipp: deployed patch for T94116 for wmf12/wmf13
* 21:35 dduvall@deploy1002: Installing scap version "4.19.1" for 561 hosts
* 20:30 gwicke: added explicit exit 1 in /etc/init.d/cassandra on restbase1008 to prevent cassandra from starting up there; is puppet restarting it?
* 21:34 dduvall: Deploying scap 4.19.1 (https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/832297/1/changelog)
* 20:29 subbu: deployed parsoid sha c4cfc527
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34735 and previous config saved to /var/cache/conftool/dbconfig/20220914-212225-ladsgroup.json
* 20:15 gwicke: bounced cassandra on restbase1001
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 20:05:09 UTC 2015 (duration 5m 8s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:32 gwicke: stopped cassandra on restbase1008
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:27 logmsgbot: twentyafterfour Synchronized php-1.26wmf13: deploying UniversalLanguageSelector commit 2e0990ac9879 (duration: 01m 58s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:26 urandom: restbase rolling restart
* 20:44 dancy@deploy1002: Sync cancelled.
* 18:21 jgage: ran 'kafka preferred-replica-election' to promote analytics1021 back to Leader
* 20:44 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:05 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf13
* 20:44 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:16 moritzm: installed libwmf security updates on various systems
* 20:44 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:09 gwicke: bounced cassandra on restbase1004
* 20:40 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:25 mutante: handing over adminship of the "test" mailman list to John F. Lewis (was: Thehelpfulone) due to inactivity
* 20:40 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db1041 load (duration: 00m 13s)
* 20:39 dancy@deploy1002: Started scap: testing
* 12:58 paravoid: manually dpkg -P ferm on potassium
* 20:38 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 05m 49s)
* 12:52 paravoid: rmmod all iptables/netfilter-related modules from potassium
* 20:34 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:23 godog: bounce cassandra on restbase1004, heap space
* 20:34 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:12 _joe_: mw1153 passed the smoke tests, repooling
* 20:34 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:08 godog: bounce cassandra on restbase1004 and restbase1005 'cannot achieve consistency level quorum'
* 20:33 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:50 godog: bounce cassandra on restbase1004, death by compaction
* 20:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 09:43 ori: _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 20:28 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:42 ori: Nuked /var/lib/carbon/whisper/ResourceLoader on graphite[12]001. Data prior to rollout of I55f0c44cd considered bogus.
* 20:24 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:42 ori: morebots, are you OK?
* 20:24 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:41 godog: bounce nutcracker on silver
* 20:21 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:33 _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 20:19 dancy@deploy1002: deploy-promote aborted: (duration: 08m 52s)
* 09:26 hashar: upgraded plugins on jenkins and restarting it
* 20:19 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 01m 24s)
* 09:06 hashar: Jenkins registering jobs with Zuul
* 20:18 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:41 hashar: Jenkins is migrating old build histories. Lot of disk IO happening
* 20:18 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 08:11 hashar: shutdowning Jenkins for upgrade.
* 20:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 05:57:10 UTC 2015 (duration 57m 9s)
* 20:13 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1041, warm up (duration: 00m 13s)
* 20:13 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-08 02:31:24+00:00
* 20:12 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:16 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-08 02:16:50+00:00
* 20:09 dancy@deploy1002: Sync cancelled.
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 48s)
* 20:09 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:09 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:02 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:02 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:02 dancy@deploy1002: Started scap: testing
* 20:01 TheresNoTime: Nothing to deploy in this UTC late backport window
* 19:57 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
* 19:57 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
* 19:55 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 12s)
* 19:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:49 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:49 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:49 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:48 dancy@deploy1002: Started scap: testing
* 19:46 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 23s)
* 19:46 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:39 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:39 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:39 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:38 dancy@deploy1002: Started scap: testing
* 19:38 dancy@deploy1002: sync-world aborted: testing (duration: 13m 25s)
* 19:35 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:26 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
* 19:24 dancy@deploy1002: Started scap: testing
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling (duration: 02m 03s)
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:15 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 18:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki (duration: 02m 05s)
* 18:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:36 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 04m 41s)
* 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:31 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 16:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 16:10 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 16:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
* 16:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2024.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2024.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2002.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2002.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1026.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1026.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1029.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1029.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1028.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1028.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1027.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1027.eqiad.wmnet on all recursors
* 15:55 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2001.codfw.wmnet on all recursors
* 15:55 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2001.codfw.wmnet on all recursors
* 15:50 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 00m 39s)
* 15:49 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
* 15:48 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
* 15:24 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:22 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:17 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34732 and previous config saved to /var/cache/conftool/dbconfig/20220914-145956-ladsgroup.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34731 and previous config saved to /var/cache/conftool/dbconfig/20220914-144449-ladsgroup.json
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34730 and previous config saved to /var/cache/conftool/dbconfig/20220914-142941-ladsgroup.json
* 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34729 and previous config saved to /var/cache/conftool/dbconfig/20220914-141434-ladsgroup.json
* 14:06 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 13:59 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 13:48 moritzm: imported zlib 1:1.2.8.dfsg-5+deb9u1+wmf1 to apt.wikimedia.org
* 13:40 Lucas_WMDE: UTC afternoon backport+config window done
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bnwiktionary --fix # [[phab:T317745|T317745]] – dry run result: 6043 links to fix, 6043 were resolvable, 0 were deleted
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:831970{{!}}Move namespace in the Bengali Wiktionary: উইকিসরাস → পরিশিষ্ট and set wgNamespaceAliases for newly created namespaces (T317745)]] (duration: 03m 41s)
* 13:28 topranks: upgrading routinator on rpki2002
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832145{{!}}Enable Content/Section translation on WPs with new MT support from Google (T313296)]] (duration: 03m 39s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:831872{{!}}Enable Section Translation in Odia Wikipedia (T313300)]] (duration: 03m 55s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:54 jayme: imported rsyslog 8.2208.0-1~bpo11+1 into bullseye-wikimedia component/rsyslog-k8s - [[phab:T289766|T289766]]
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34725 and previous config saved to /var/cache/conftool/dbconfig/20220914-115920-ladsgroup.json
* 11:49 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-eqdfw,cr2-eqdfw IPv6
* 11:49 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-eqdfw,cr2-eqdfw IPv6
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34723 and previous config saved to /var/cache/conftool/dbconfig/20220914-114413-ladsgroup.json
* 11:29 topranks: rebooting cr2-eqdfw to complete upgrade
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34721 and previous config saved to /var/cache/conftool/dbconfig/20220914-112907-ladsgroup.json
* 11:14 topranks: Shutting down internet transit and peering on cr2-eqdfw in advance of upgrade reboot
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34719 and previous config saved to /var/cache/conftool/dbconfig/20220914-111400-ladsgroup.json
* 11:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
* 11:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
* 11:01 topranks: Prepping to upgrade JunOS on cr2-eqdfw.  Adjusting OSPF costs to force traffic via alternate POPs.
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34717 and previous config saved to /var/cache/conftool/dbconfig/20220914-103810-root.json
* 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:24 kharlan@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:831969{{!}}BlockMetrics: Update to new event schema version (T306018)]] (duration: 03m 48s)
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34715 and previous config saved to /var/cache/conftool/dbconfig/20220914-102305-root.json
* 10:18 moritzm: import routinator 0.11.3-1bullseye  to thirdparty/routinator
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34714 and previous config saved to /var/cache/conftool/dbconfig/20220914-100800-root.json
* 10:00 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 09:59 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 09:58 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:57 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:57 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:53 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:53 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34713 and previous config saved to /var/cache/conftool/dbconfig/20220914-095255-root.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34712 and previous config saved to /var/cache/conftool/dbconfig/20220914-093750-root.json
* 09:27 moritzm: installing zlib/libxslt security updates on buster
* 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34711 and previous config saved to /var/cache/conftool/dbconfig/20220914-092620-ladsgroup.json
* 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34710 and previous config saved to /var/cache/conftool/dbconfig/20220914-092558-ladsgroup.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34709 and previous config saved to /var/cache/conftool/dbconfig/20220914-092245-root.json
* 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34708 and previous config saved to /var/cache/conftool/dbconfig/20220914-091052-ladsgroup.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34707 and previous config saved to /var/cache/conftool/dbconfig/20220914-090740-root.json
* 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 09:05 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34706 and previous config saved to /var/cache/conftool/dbconfig/20220914-085545-ladsgroup.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34705 and previous config saved to /var/cache/conftool/dbconfig/20220914-085235-root.json
* 08:50 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-test
* 08:49 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-test
* 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34704 and previous config saved to /var/cache/conftool/dbconfig/20220914-084039-ladsgroup.json
* 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart on A:wdqs-test
* 08:38 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart on A:wdqs-test
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
* 08:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]] (duration: 06m 51s)
* 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:25 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]]
* 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1024.eqiad.wmnet with reason: down
* 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1024.eqiad.wmnet with reason: down
* 08:02 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es5 [[phab:T317739|T317739]] (duration: 03m 38s)
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34703 and previous config saved to /var/cache/conftool/dbconfig/20220914-075722-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34702 and previous config saved to /var/cache/conftool/dbconfig/20220914-075550-marostegui.json
* 07:55 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T317739|T317739]]
* 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:50 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es5 [[phab:T317739|T317739]] (duration: 04m 13s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1023 with weight 0 [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34701 and previous config saved to /var/cache/conftool/dbconfig/20220914-074617-marostegui.json
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T317739|T317739]]
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T317739|T317739]]
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34700 and previous config saved to /var/cache/conftool/dbconfig/20220914-074248-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34699 and previous config saved to /var/cache/conftool/dbconfig/20220914-072743-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34698 and previous config saved to /var/cache/conftool/dbconfig/20220914-071238-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34697 and previous config saved to /var/cache/conftool/dbconfig/20220914-065733-root.json
* 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34696 and previous config saved to /var/cache/conftool/dbconfig/20220914-064330-ladsgroup.json
* 06:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34695 and previous config saved to /var/cache/conftool/dbconfig/20220914-064309-ladsgroup.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34694 and previous config saved to /var/cache/conftool/dbconfig/20220914-064228-root.json
* 06:38 elukey: restart kafka on kafka-logging2003 to pick up  the new PKI TLS settings
* 06:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
* 06:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
* 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34693 and previous config saved to /var/cache/conftool/dbconfig/20220914-062802-ladsgroup.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34692 and previous config saved to /var/cache/conftool/dbconfig/20220914-062723-root.json
* 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34691 and previous config saved to /var/cache/conftool/dbconfig/20220914-061256-ladsgroup.json
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: down
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: down
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34690 and previous config saved to /var/cache/conftool/dbconfig/20220914-060913-root.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 codfw primary [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34689 and previous config saved to /var/cache/conftool/dbconfig/20220914-060807-marostegui.json
* 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34688 and previous config saved to /var/cache/conftool/dbconfig/20220914-055749-ladsgroup.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34687 and previous config saved to /var/cache/conftool/dbconfig/20220914-055156-marostegui.json
* 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T317735|T317735]]
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T317735|T317735]]
* 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34686 and previous config saved to /var/cache/conftool/dbconfig/20220914-052510-ladsgroup.json
* 05:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 05:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34685 and previous config saved to /var/cache/conftool/dbconfig/20220914-052448-ladsgroup.json
* 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34684 and previous config saved to /var/cache/conftool/dbconfig/20220914-050942-ladsgroup.json
* 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34683 and previous config saved to /var/cache/conftool/dbconfig/20220914-045435-ladsgroup.json
* 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34682 and previous config saved to /var/cache/conftool/dbconfig/20220914-043929-ladsgroup.json
* 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34681 and previous config saved to /var/cache/conftool/dbconfig/20220914-035624-ladsgroup.json
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34680 and previous config saved to /var/cache/conftool/dbconfig/20220914-035546-ladsgroup.json
* 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34679 and previous config saved to /var/cache/conftool/dbconfig/20220914-034040-ladsgroup.json
* 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34678 and previous config saved to /var/cache/conftool/dbconfig/20220914-033921-ladsgroup.json
* 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34677 and previous config saved to /var/cache/conftool/dbconfig/20220914-032533-ladsgroup.json
* 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34676 and previous config saved to /var/cache/conftool/dbconfig/20220914-032415-ladsgroup.json
* 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34675 and previous config saved to /var/cache/conftool/dbconfig/20220914-031027-ladsgroup.json
* 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34674 and previous config saved to /var/cache/conftool/dbconfig/20220914-030908-ladsgroup.json
* 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34673 and previous config saved to /var/cache/conftool/dbconfig/20220914-025402-ladsgroup.json
* 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34672 and previous config saved to /var/cache/conftool/dbconfig/20220914-013204-ladsgroup.json
* 01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34671 and previous config saved to /var/cache/conftool/dbconfig/20220914-013143-ladsgroup.json
* 01:24 eileen: civicrm upgraded from {{Gerrit|d91b4a2c}} to {{Gerrit|e82d9cd0}}
* 01:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34670 and previous config saved to /var/cache/conftool/dbconfig/20220914-011637-ladsgroup.json
* 01:14 ejegg: disabled delete_deleted_contacts job (will take effect when current job ends)
* 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34669 and previous config saved to /var/cache/conftool/dbconfig/20220914-010130-ladsgroup.json
* 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34668 and previous config saved to /var/cache/conftool/dbconfig/20220914-004624-ladsgroup.json


== July 7 ==
== 2022-09-13 ==
* 23:54 jgage: kafka brokers 1018 & 1021 were demoted; i have triggered a leader election and they are leaders again
* 23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34667 and previous config saved to /var/cache/conftool/dbconfig/20220913-234607-ladsgroup.json
* 23:05 logmsgbot: catrope Synchronized visualeditor-default.dblist: Enable VE by default on labswiki (duration: 00m 12s)
* 23:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 21:56 hoo: Restarted hhvm on mw1003 "Fatal error: Function already defined: wmfLoadInitialiseSettings in /srv/mediawiki/wmf-config/CommonSettings.php on line 187"
* 23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 21:16 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/resourceloader/ResourceLoader.php: T104769 (duration: 00m 13s)
* 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34666 and previous config saved to /var/cache/conftool/dbconfig/20220913-234546-ladsgroup.json
* 20:53 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf13
* 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34665 and previous config saved to /var/cache/conftool/dbconfig/20220913-233039-ladsgroup.json
* 20:00 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf13 and rebuild l10n cache (duration: 39m 41s)
* 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34664 and previous config saved to /var/cache/conftool/dbconfig/20220913-231533-ladsgroup.json
* 19:47 gwicke: restarted cassandra on restbase1005
* 23:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 19:20 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf13 and rebuild l10n cache
* 23:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 19:15 moritzm: installed PHP security updates on all trusty hosts
* 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34663 and previous config saved to /var/cache/conftool/dbconfig/20220913-231257-ladsgroup.json
* 18:58 ejegg: updated payments from a17ee221db0dbde70c92e24fc188379b6dbad613 to ec34ebf61e5962f66b807abdcb519ff323d41e8e
* 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34662 and previous config saved to /var/cache/conftool/dbconfig/20220913-230317-ladsgroup.json
* 18:08 twentyafterfour: restarted apache2 on iridium (phab hotfix)
* 23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 17:10 robh: OTRS update appears to be functioning normally.  As such, ending maintenance window.
* 23:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 17:06 robh: otrs is now using the new sha256 cert
* 23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34661 and previous config saved to /var/cache/conftool/dbconfig/20220913-230255-ladsgroup.json
* 17:00 robh: starting otrs maint window
* 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34660 and previous config saved to /var/cache/conftool/dbconfig/20220913-230026-ladsgroup.json
* 16:58 _joe_: restarted HHVM on mw1026, near to OOM
* 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34659 and previous config saved to /var/cache/conftool/dbconfig/20220913-225750-ladsgroup.json
* 16:47 twentyafterfour: applied hotfix for phabricator bug: https://secure.phabricator.com/D13544
* 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34658 and previous config saved to /var/cache/conftool/dbconfig/20220913-224749-ladsgroup.json
* 16:36 mutante: protactinium - manual iptables rules replaced by puppet/ferm rules
* 22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34657 and previous config saved to /var/cache/conftool/dbconfig/20220913-224244-ladsgroup.json
* 16:11 logmsgbot: thcipriani Synchronized php-1.26wmf12/extensions/ContentTranslation/extension.json: Remove default value for ContentTranslationCampaigns (duration: 00m 12s)
* 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34656 and previous config saved to /var/cache/conftool/dbconfig/20220913-223241-ladsgroup.json
* 15:33 jynus: manually editing table mediawiki.ipblocks to fully solve a former software bug
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34655 and previous config saved to /var/cache/conftool/dbconfig/20220913-223025-ladsgroup.json
* 15:12 Jeff_Green: ptr records for frack/codfw and authdns-update
* 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 15:10 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable ContentTranslation in enwiki [[gerrit:222991]] (duration: 00m 13s)
* 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 14:21 jynus: dropping optin_survey_old table from enwiki
* 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34654 and previous config saved to /var/cache/conftool/dbconfig/20220913-222738-ladsgroup.json
* 13:23 akosiaris: restarting gitblit on antimony
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:31 mobrovac: restbase restarted cassandra on rb1005
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:26 godog: restart cassandra on restbase1004, heap exhausted
* 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:49 godog: restarted cassandra on restbase1005, mutations through the roof
* 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:27 godog: set operations/puppet/cassandra git submodule repo as hidden
* 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34653 and previous config saved to /var/cache/conftool/dbconfig/20220913-221734-ladsgroup.json
* 06:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul  7 06:11:46 UTC 2015 (duration 11m 45s)
* 22:16 dancy: dancy@deploy1002$ rm /var/lib/deploy-mwdebug/pause
* 05:51 logmsgbot: krinkle Synchronized php-1.26wmf12/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.js: I3e965dda1c4 (duration: 00m 12s)
* 22:15 dancy@deploy1002: Sync cancelled.
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-07 02:27:55+00:00
* 22:15 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 09s)
* 22:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:12 ori: Re-pooled mw1152 at 20:46 UTC, did not log it then.
* 22:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:41 springle: upgrade db1041 trusty
* 22:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:37 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/CentralAuth/includes/CreateLocalAccountJob.php: https://gerrit.wikimedia.org/r/#/c/223211/ (duration: 00m 13s)
* 22:13 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:12 dancy@deploy1002: Started scap: testing
* 22:12 dancy@deploy1002: Sync cancelled.
* 22:11 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 22:11 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:11 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:10 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:08 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:07 dancy@deploy1002: Started scap: testing
* 22:07 dancy@deploy1002: Sync cancelled.
* 22:07 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:06 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:05 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:03 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:02 dancy@deploy1002: Started scap: testing
* 22:01 dancy@deploy1002: Sync cancelled.
* 22:01 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 22:01 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:58 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:58 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:55 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:55 dancy@deploy1002: Started scap: testing
* 21:55 dancy@deploy1002: Sync cancelled.
* 21:54 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 21:54 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:50 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:50 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:48 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:47 dancy@deploy1002: Started scap: testing
* 21:37 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ MW_CONFIG_BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restric
* 21:36 dancy@deploy1002: Started scap: testing
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:16 dancy: dancy@deploy1002  touch /var/lib/deploy-mwdebug/pause
* 21:16 dancy@deploy1002: Sync cancelled.
* 21:15 dancy@deploy1002: dancy: testing [[phab:T299648|T299648]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:10 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:04 dancy@deploy1002: Started scap: testing [[phab:T299648|T299648]]
* 20:25 cjming: end of UTC late backport window
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 cjming@deploy1002: Finished scap: Backport for [[gerrit:831223{{!}}add tagline and update wordmark in ptwikinews (T313174)]] (duration: 05m 50s)
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 cjming@deploy1002: cjming and aishik: Backport for [[gerrit:831223{{!}}add tagline and update wordmark in ptwikinews (T313174)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:09 cjming@deploy1002: Started scap: Backport for [[gerrit:831223{{!}}add tagline and update wordmark in ptwikinews (T313174)]]
* 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34652 and previous config saved to /var/cache/conftool/dbconfig/20220913-200344-ladsgroup.json
* 20:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 20:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34651 and previous config saved to /var/cache/conftool/dbconfig/20220913-200214-ladsgroup.json
* 20:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34650 and previous config saved to /var/cache/conftool/dbconfig/20220913-200152-ladsgroup.json
* 19:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34649 and previous config saved to /var/cache/conftool/dbconfig/20220913-194645-ladsgroup.json
* 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34648 and previous config saved to /var/cache/conftool/dbconfig/20220913-193139-ladsgroup.json
* 19:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34647 and previous config saved to /var/cache/conftool/dbconfig/20220913-191632-ladsgroup.json
* 19:01 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 18:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34646 and previous config saved to /var/cache/conftool/dbconfig/20220913-183259-ladsgroup.json
* 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34645 and previous config saved to /var/cache/conftool/dbconfig/20220913-183238-ladsgroup.json
* 18:31 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:831941{{!}}InitialiseSettings-labs.php: Set $wgPhonosPath (T317417)]] (duration: 03m 45s)
* 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:28 TheresNoTime: deploying a beta cluster only config change, [[phab:T317417|T317417]]
* 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34644 and previous config saved to /var/cache/conftool/dbconfig/20220913-181731-ladsgroup.json
* 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34643 and previous config saved to /var/cache/conftool/dbconfig/20220913-180225-ladsgroup.json
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34642 and previous config saved to /var/cache/conftool/dbconfig/20220913-174718-ladsgroup.json
* 17:43 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
* 17:41 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
* 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34640 and previous config saved to /var/cache/conftool/dbconfig/20220913-173721-ladsgroup.json
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34639 and previous config saved to /var/cache/conftool/dbconfig/20220913-173254-ladsgroup.json
* 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34638 and previous config saved to /var/cache/conftool/dbconfig/20220913-172215-ladsgroup.json
* 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34637 and previous config saved to /var/cache/conftool/dbconfig/20220913-171747-ladsgroup.json
* 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34636 and previous config saved to /var/cache/conftool/dbconfig/20220913-170708-ladsgroup.json
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34635 and previous config saved to /var/cache/conftool/dbconfig/20220913-170241-ladsgroup.json
* 16:56 ejegg: updated fundraising CiviCRM from {{Gerrit|efbbcb57}} to {{Gerrit|d91b4a2c}}
* 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34634 and previous config saved to /var/cache/conftool/dbconfig/20220913-165202-ladsgroup.json
* 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34633 and previous config saved to /var/cache/conftool/dbconfig/20220913-165117-ladsgroup.json
* 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34632 and previous config saved to /var/cache/conftool/dbconfig/20220913-165056-ladsgroup.json
* 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34631 and previous config saved to /var/cache/conftool/dbconfig/20220913-164734-ladsgroup.json
* 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34630 and previous config saved to /var/cache/conftool/dbconfig/20220913-163549-ladsgroup.json
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34629 and previous config saved to /var/cache/conftool/dbconfig/20220913-162043-ladsgroup.json
* 16:13 godog: add 200G to prometheus/eqiad instance ops
* 16:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 16:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34628 and previous config saved to /var/cache/conftool/dbconfig/20220913-160536-ladsgroup.json
* 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
* 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34626 and previous config saved to /var/cache/conftool/dbconfig/20220913-154810-root.json
* 15:42 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis (duration: 02m 07s)
* 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34625 and previous config saved to /var/cache/conftool/dbconfig/20220913-154151-ladsgroup.json
* 15:40 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis
* 15:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34624 and previous config saved to /var/cache/conftool/dbconfig/20220913-152644-ladsgroup.json
* 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:14 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]] (duration: 04m 31s)
* 15:13 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
* 15:12 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34623 and previous config saved to /var/cache/conftool/dbconfig/20220913-151138-ladsgroup.json
* 15:10 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]]
* 15:08 dancy@deploy1002: deploy-promote aborted:  (duration: 00m 02s)
* 14:59 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 04m 43s)
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34622 and previous config saved to /var/cache/conftool/dbconfig/20220913-145631-ladsgroup.json
* 14:54 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:47 dancy@deploy1002: deploy-promote aborted:  (duration: 01m 03s)
* 14:47 dancy@deploy1002: prep aborted:  (duration: 00m 12s)
* 14:46 moritzm: restarting FPM/Apache on mediawiki canaries
* 14:44 moritzm: installing libxslt security updates on buster
* 14:18 topranks: Core router upgrade in codfw complete - maintenance closed.
* 14:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
* 14:12 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
* 14:07 topranks: re-activating Transit on IX BGP on cr2-codfw
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34621 and previous config saved to /var/cache/conftool/dbconfig/20220913-135729-ladsgroup.json
* 13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34620 and previous config saved to /var/cache/conftool/dbconfig/20220913-135707-ladsgroup.json
* 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34619 and previous config saved to /var/cache/conftool/dbconfig/20220913-134201-ladsgroup.json
* 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34618 and previous config saved to /var/cache/conftool/dbconfig/20220913-133339-ladsgroup.json
* 13:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34617 and previous config saved to /var/cache/conftool/dbconfig/20220913-133317-ladsgroup.json
* 13:33 Lucas_WMDE: UTC afternoon backport+config window done
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34616 and previous config saved to /var/cache/conftool/dbconfig/20220913-132654-ladsgroup.json
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:826234{{!}}testwiki: Add mediawiki.edit_attempt stream (T309013)]] (2/2) (duration: 03m 33s)
* 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:826234{{!}}testwiki: Add mediawiki.edit_attempt stream (T309013)]] (1/2) (duration: 03m 39s)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 Emperor: set thanos ring replicas to 3.85 [[phab:T311690|T311690]]
* 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34615 and previous config saved to /var/cache/conftool/dbconfig/20220913-131811-ladsgroup.json
* 13:14 topranks: Flipping back to RE0 on cr2-codfw (last disruptive switch)
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:824685{{!}}Remove $wgWMESearchRelevancePages]] (unused) (duration: 03m 53s)
* 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34614 and previous config saved to /var/cache/conftool/dbconfig/20220913-131148-ladsgroup.json
* 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34613 and previous config saved to /var/cache/conftool/dbconfig/20220913-130304-ladsgroup.json
* 12:59 topranks: Switching active RE back to RE1 on cr1-codfw as firmware hadn't been loaded while it was master
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34612 and previous config saved to /var/cache/conftool/dbconfig/20220913-125745-ladsgroup.json
* 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34611 and previous config saved to /var/cache/conftool/dbconfig/20220913-125723-ladsgroup.json
* 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34610 and previous config saved to /var/cache/conftool/dbconfig/20220913-124758-ladsgroup.json
* 12:46 topranks: forcing non-graceful RE switchover on cr2-codfw as part of upgrade
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34609 and previous config saved to /var/cache/conftool/dbconfig/20220913-124217-ladsgroup.json
* 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34608 and previous config saved to /var/cache/conftool/dbconfig/20220913-122710-ladsgroup.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34607 and previous config saved to /var/cache/conftool/dbconfig/20220913-122415-root.json
* 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34606 and previous config saved to /var/cache/conftool/dbconfig/20220913-121204-ladsgroup.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34605 and previous config saved to /var/cache/conftool/dbconfig/20220913-120910-root.json
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34604 and previous config saved to /var/cache/conftool/dbconfig/20220913-120653-ladsgroup.json
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34603 and previous config saved to /var/cache/conftool/dbconfig/20220913-120632-ladsgroup.json
* 11:58 topranks: Disabling transit and ixp BGP on cr2-codfw in advance of software upgrade
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34602 and previous config saved to /var/cache/conftool/dbconfig/20220913-115405-root.json
* 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34601 and previous config saved to /var/cache/conftool/dbconfig/20220913-115125-ladsgroup.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34600 and previous config saved to /var/cache/conftool/dbconfig/20220913-113900-root.json
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34599 and previous config saved to /var/cache/conftool/dbconfig/20220913-113619-ladsgroup.json
* 11:34 hashar: Upgrading CI Jenkins [[phab:T317418|T317418]]
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34598 and previous config saved to /var/cache/conftool/dbconfig/20220913-112818-ladsgroup.json
* 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34597 and previous config saved to /var/cache/conftool/dbconfig/20220913-112355-root.json
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34596 and previous config saved to /var/cache/conftool/dbconfig/20220913-112112-ladsgroup.json
* 11:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
* 11:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
* 11:15 topranks: completed cr1-codfw upgrade, will proceed to cr2-codfw shortly
* 11:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
* 11:14 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
* 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:09 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.1/includes/libs/rdbms/ChronologyProtector.php: Backport: [[gerrit:831847{{!}}rdbms: Bump ChronologyProtector cache key version (T317606)]] (duration: 03m 49s)
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34595 and previous config saved to /var/cache/conftool/dbconfig/20220913-110850-root.json
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34594 and previous config saved to /var/cache/conftool/dbconfig/20220913-110755-ladsgroup.json
* 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34593 and previous config saved to /var/cache/conftool/dbconfig/20220913-110715-root.json
* 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34592 and previous config saved to /var/cache/conftool/dbconfig/20220913-105733-root.json
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 codfw primary [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34591 and previous config saved to /var/cache/conftool/dbconfig/20220913-105642-marostegui.json
* 10:56 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 10:56 marostegui: Starting s2 codfw failover from db2104 to db2107 - [[phab:T317627|T317627]]
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34590 and previous config saved to /var/cache/conftool/dbconfig/20220913-105210-root.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34589 and previous config saved to /var/cache/conftool/dbconfig/20220913-103705-root.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 from api [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34588 and previous config saved to /var/cache/conftool/dbconfig/20220913-103658-marostegui.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34587 and previous config saved to /var/cache/conftool/dbconfig/20220913-103621-marostegui.json
* 10:35 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 [[phab:T317627|T317627]]
* 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 [[phab:T317627|T317627]]
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34586 and previous config saved to /var/cache/conftool/dbconfig/20220913-102232-ladsgroup.json
* 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34585 and previous config saved to /var/cache/conftool/dbconfig/20220913-102147-root.json
* 10:16 topranks: Flipping master RE on cr1-codfw to backup as part of upgrade
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34584 and previous config saved to /var/cache/conftool/dbconfig/20220913-100642-root.json
* 10:04 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 09:52 elukey: restart kafka on kafka-logging2002 to move it to PKI-based TLS certs
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34583 and previous config saved to /var/cache/conftool/dbconfig/20220913-095137-root.json
* 09:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
* 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
* 09:45 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
* 09:41 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 09:37 hashar: Restarting CI Jenkins on contint2001 (with new systemd service)
* 09:33 hashar: Enabling Puppet on contint2001 for Jenkins systemd change
* 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
* 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34582 and previous config saved to /var/cache/conftool/dbconfig/20220913-092904-ladsgroup.json
* 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34581 and previous config saved to /var/cache/conftool/dbconfig/20220913-092826-ladsgroup.json
* 09:25 hashar: Stopped Puppet on contint2001 for a Jenkins systemd change
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 [[phab:T317614|T317614]]', diff saved to https://phabricator.wikimedia.org/P34580 and previous config saved to /var/cache/conftool/dbconfig/20220913-092200-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary [[phab:T317614|T317614]]', diff saved to https://phabricator.wikimedia.org/P34579 and previous config saved to /var/cache/conftool/dbconfig/20220913-092032-root.json
* 09:19 marostegui: Starting s1 codfw failover from db2103 to db2112 - [[phab:T317614|T317614]]
* 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34578 and previous config saved to /var/cache/conftool/dbconfig/20220913-091320-ladsgroup.json
* 09:11 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:11 volans@cumin1001: START - Cookbook sre.network.cf
* 09:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
* 09:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
* 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34577 and previous config saved to /var/cache/conftool/dbconfig/20220913-085814-ladsgroup.json
* 08:56 topranks: Flipping primary routing engine to RE1 on cr1-codfw (disruptive) as part of upgrade.
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 [[phab:T317614|T317614]]', diff saved to https://phabricator.wikimedia.org/P34576 and previous config saved to /var/cache/conftool/dbconfig/20220913-085456-marostegui.json
* 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 [[phab:T317614|T317614]]
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 [[phab:T317614|T317614]]
* 08:46 topranks: Disabled LVS/PyBal peerings on cr1-codfw ain advance of upgrade to router.
* 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
* 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34575 and previous config saved to /var/cache/conftool/dbconfig/20220913-084307-ladsgroup.json
* 08:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
* 08:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
* 08:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
* 08:27 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 08:27 cmooney@cumin1001: START - Cookbook sre.network.cf
* 08:17 moritzm: roll-restarting apache/FPM on mw canaries to pick up zlib security updates
* 08:15 topranks: de-pooling codfw ahead of core router upgrades at the site
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:18 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]] (duration: 04m 29s)
* 07:14 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]]
* 07:11 jhuneidi@deploy1002: deploy-promote aborted:  (duration: 00m 09s)
* 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34574 and previous config saved to /var/cache/conftool/dbconfig/20220913-065457-ladsgroup.json
* 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34573 and previous config saved to /var/cache/conftool/dbconfig/20220913-063951-ladsgroup.json
* 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34572 and previous config saved to /var/cache/conftool/dbconfig/20220913-063908-ladsgroup.json
* 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34571 and previous config saved to /var/cache/conftool/dbconfig/20220913-062444-ladsgroup.json
* 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34570 and previous config saved to /var/cache/conftool/dbconfig/20220913-060938-ladsgroup.json
* 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34569 and previous config saved to /var/cache/conftool/dbconfig/20220913-045832-ladsgroup.json
* 04:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 04:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34568 and previous config saved to /var/cache/conftool/dbconfig/20220913-045811-ladsgroup.json
* 04:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34567 and previous config saved to /var/cache/conftool/dbconfig/20220913-044304-ladsgroup.json
* 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34566 and previous config saved to /var/cache/conftool/dbconfig/20220913-042758-ladsgroup.json
* 04:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34565 and previous config saved to /var/cache/conftool/dbconfig/20220913-041251-ladsgroup.json
* 04:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.27 (duration: 01m 59s)
* 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 35m 37s)
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34564 and previous config saved to /var/cache/conftool/dbconfig/20220913-022136-ladsgroup.json
* 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34563 and previous config saved to /var/cache/conftool/dbconfig/20220913-022114-ladsgroup.json
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34562 and previous config saved to /var/cache/conftool/dbconfig/20220913-020608-ladsgroup.json
* 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34561 and previous config saved to /var/cache/conftool/dbconfig/20220913-015102-ladsgroup.json
* 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34560 and previous config saved to /var/cache/conftool/dbconfig/20220913-013555-ladsgroup.json
* 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
* 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
* 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
* 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
* 00:48 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
* 00:48 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
* 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34559 and previous config saved to /var/cache/conftool/dbconfig/20220913-001908-ladsgroup.json
* 00:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34558 and previous config saved to /var/cache/conftool/dbconfig/20220913-001846-ladsgroup.json
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34557 and previous config saved to /var/cache/conftool/dbconfig/20220913-000340-ladsgroup.json


== July 6 ==
== 2022-09-12 ==
* 23:50 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 12s)
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34556 and previous config saved to /var/cache/conftool/dbconfig/20220912-234833-ladsgroup.json
* 23:49 logmsgbot: krenair Synchronized w/static/images/project-logos/mrwikisource.png: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 13s)
* 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34555 and previous config saved to /var/cache/conftool/dbconfig/20220912-233327-ladsgroup.json
* 23:35 logmsgbot: krenair Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/223179/ - should be labs-only (duration: 00m 12s)
* 22:53 mutante: phabricator - disabling MediaWiki extension repositories in Diffusion that have 0 commits - [[phab:T296022|T296022]] - [[phab:T315706|T315706]]
* 23:32 logmsgbot: krenair Synchronized README: https://gerrit.wikimedia.org/r/#/c/222941/ - ... (duration: 00m 13s)
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34554 and previous config saved to /var/cache/conftool/dbconfig/20220912-224006-ladsgroup.json
* 23:27 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221809/ - should be a noop, just doc changes (duration: 00m 13s)
* 22:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 23:25 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221808/ (duration: 00m 13s)
* 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 23:17 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/223185/ (duration: 00m 12s)
* 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220970/ (duration: 00m 14s)
* 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 21:46 gwicke: restarted cassandra instance on restbase1003; was low on memory and constantly writing small chunks
* 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34553 and previous config saved to /var/cache/conftool/dbconfig/20220912-223927-ladsgroup.json
* 21:30 andrewbogott: rebooting labvirt1005, again. Somehow virtualization is turned off again
* 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34552 and previous config saved to /var/cache/conftool/dbconfig/20220912-222420-ladsgroup.json
* 21:12 subbu: deployed parsoid version 87a746e6
* 22:23 mutante: phabricator - disabling repositories: tool-xh-bot, tool-editor-contribution-dashboard, tool-ranker, tool-editor-contribution, tool-mikasa-bot-1, tool-maintun, tool-add-text, tool-wikibookassamese-book.php (none of them had commits) [[phab:T296022|T296022]] - [[phab:T315706|T315706]]
* 21:04 logmsgbot: ori Synchronized php-1.26wmf12/thumb.php: cdc75debaf: Add Content-Length header to thumb.php error responses (duration: 00m 13s)
* 22:20 mutante: phabricator - disabling repository "tool-ranker"
* 21:02 mutante: purging static-bz URL on varnish ...
* 22:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:39 akosiaris: upload php5_5.3.10-1ubuntu3.19-wmf1 on apt.wikimedia.org/precise-wikimedia
* 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 gwicke: restart cassandra instance on 1005
* 22:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:04 mobrovac: restbase restart cassandra on rb1005
* 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:28 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/223040/ (duration: 00m 12s)
* 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34551 and previous config saved to /var/cache/conftool/dbconfig/20220912-220914-ladsgroup.json
* 19:11 gwicke: reduced compaction throughput from 160 to 100 mb/s across the cassandra cluster via 'nodetool -h <host> setcompactionthroughput 100'
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34550 and previous config saved to /var/cache/conftool/dbconfig/20220912-215407-ladsgroup.json
* 18:51 gwicke: restarted cassandra on restbase1001 with jdk8, see T104888
* 21:07 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 19s)
* 18:22 gwicke: restarted cassandra on restbase1004 with jdk8
* 21:07 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 17:54 Jeff_Green: authdns-update for new rigel A record
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34549 and previous config saved to /var/cache/conftool/dbconfig/20220912-210123-ladsgroup.json
* 17:42 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: increase db2029 traffic to normal levels (duration: 00m 12s)
* 20:57 TheresNoTime: closing UTC late backport window
* 17:37 gwicke: upgraded restbase1005 to jdk8
* 20:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]] (duration: 05m 00s)
* 17:35 gwicke: restarting cassandra instance on restbase1005: out of heap
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:10 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade(2/2) (duration: 00m 11s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:09 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade (duration: 00m 11s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:38 jynus: upgrade and restart of db2029
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:35 ori: depooled mw1152
* 20:51 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:29 logmsgbot: krenair Finished scap: https://gerrit.wikimedia.org/r/#/c/222993/ (duration: 22m 09s)
* 20:51 samtar@deploy1002: Started scap: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]]
* 15:21 _joe_: repooling mw1152
* 20:49 samtar@deploy1002: Finished scap: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]] (duration: 07m 27s)
* 15:20 _joe_: attempting dump-apc on mw1060
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:09 _joe_: depooled the HHVM imagescaler again
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:07 logmsgbot: krenair Started scap: https://gerrit.wikimedia.org/r/#/c/222993/
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222617/ (duration: 00m 12s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:48 moritzm: installed python security updates on analytics*, lab* and virt*
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34548 and previous config saved to /var/cache/conftool/dbconfig/20220912-204617-ladsgroup.json
* 14:46 moritzm: added python-diskimage-builder 0.1.46-1+wmf1 for jessie-wikimedia on carbon
* 20:42 samtar@deploy1002: samtar and jdlrobson: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:43 _joe_: depooled the HHVM imagescaler, spitting 503s again.
* 20:42 samtar@deploy1002: Started scap: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]]
* 14:18 mobrovac: restbase started thinning out parsoid data (local_group_wikipedia_T_parsoid_dataDVIsgzJSne8k) for >= 22 days
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:07 YuviPanda: restart apache on labcontrol1001 to pick up parser function change
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:57 moritzm: installed python security updates on mw*, es* and db*
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:18 logmsgbot: hoo Synchronized wmf-config/: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata (duration: 00m 13s)
* 20:40 samtar@deploy1002: Finished scap: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]] (duration: 06m 25s)
* 12:15 logmsgbot: hoo Finished scap: Update WikibaseQuality and WikibaseQualityConstraint (duration: 25m 56s)
* 20:39 jhathaway: testing exim config change on mx1001.wikimedia.org
* 11:49 logmsgbot: hoo Started scap: Update WikibaseQuality and WikibaseQualityConstraint
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 hoo: Created the `wbqc_constraints` table on wikidatawiki
* 20:38 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dispatch-be1001.eqiad.wmnet
* 09:02 _joe_: restarted the appserver on mw1059 with hhvm.server.apc.expire_on_sets = true, restarted the heap profiling to confirm my hypothesis on T104769
* 20:34 samtar@deploy1002: samtar and dani: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:31 _joe_: restarted cassandra on rb1004. again.
* 20:34 samtar@deploy1002: Started scap: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]]
* 05:01 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1034, depool db1041 (duration: 00m 12s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:00 springle: stash/pull/apply CommonSettings.php on tin, which was left with modifications
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]] (duration: 06m 12s)
* 04:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul  6 04:35:45 UTC 2015 (duration 35m 44s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-06 02:22:12+00:00
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 07s)
* 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34547 and previous config saved to /var/cache/conftool/dbconfig/20220912-203110-ladsgroup.json
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:26 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:26 samtar@deploy1002: Started scap: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]]
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:830982{{!}}Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)]] (duration: 08m 14s)
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34546 and previous config saved to /var/cache/conftool/dbconfig/20220912-202359-ladsgroup.json
* 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:16 samtar@deploy1002: samtar and aishik: Backport for [[gerrit:830982{{!}}Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:830982{{!}}Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)]]
* 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34545 and previous config saved to /var/cache/conftool/dbconfig/20220912-201604-ladsgroup.json
* 20:15 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dispatch-be1001.eqiad.wmnet on all recursors
* 20:14 herron@cumin1001: START - Cookbook sre.dns.wipe-cache dispatch-be1001.eqiad.wmnet on all recursors
* 20:14 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:831167{{!}}Mark spcomwiki and searchcomwiki as closed (T285685)]] (duration: 05m 40s)
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:12 herron@cumin1001: START - Cookbook sre.dns.netbox
* 20:12 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host dispatch-be1001.eqiad.wmnet
* 20:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:09 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:831167{{!}}Mark spcomwiki and searchcomwiki as closed (T285685)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:08 samtar@deploy1002: Started scap: Backport for [[gerrit:831167{{!}}Mark spcomwiki and searchcomwiki as closed (T285685)]]
* 20:07 samtar@deploy1002: backport aborted:  (duration: 03m 46s)
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts theemin.codfw.wmnet
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:59 maryum: deployed security patch for [[phab:T314245|T314245]]
* 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.decommission for hosts theemin.codfw.wmnet
* 19:58 mstyles@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/PageTriage/includes/Api/ApiPageTriageAction.php: (no justification provided) (duration: 03m 42s)
* 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34544 and previous config saved to /var/cache/conftool/dbconfig/20220912-195540-ladsgroup.json
* 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34543 and previous config saved to /var/cache/conftool/dbconfig/20220912-195519-ladsgroup.json
* 19:53 sbassett: Deployed security patch for [[phab:T311337|T311337]]
* 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34542 and previous config saved to /var/cache/conftool/dbconfig/20220912-194013-ladsgroup.json
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34541 and previous config saved to /var/cache/conftool/dbconfig/20220912-192858-ladsgroup.json
* 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34540 and previous config saved to /var/cache/conftool/dbconfig/20220912-192837-ladsgroup.json
* 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 02m 04s)
* 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
* 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34539 and previous config saved to /var/cache/conftool/dbconfig/20220912-192506-ladsgroup.json
* 19:20 dancy@deploy1002: Installation of scap version "4.19.0" completed for 561 hosts
* 19:20 dancy@deploy1002: Installing scap version "4.19.0" for 561 hosts
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34538 and previous config saved to /var/cache/conftool/dbconfig/20220912-191330-ladsgroup.json
* 19:12 dancy@deploy1002: Installation of scap version "4.18.0" completed for 561 hosts
* 19:12 dancy@deploy1002: Installing scap version "4.18.0" for 561 hosts
* 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34537 and previous config saved to /var/cache/conftool/dbconfig/20220912-191000-ladsgroup.json
* 19:08 inflatador: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 19:04 ryankemper: [WCQS] Depooled `wcqs100[1,2]` while they catch up on ~1.5 days worth of lag (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wcqs&viewPanel=8&from=1662910789183&to=1663068616559)
* 19:00 inflatador: [WCQS Deploy] Test query passed on commons-query.wikimedia.org; WCQS deploy complete
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34536 and previous config saved to /var/cache/conftool/dbconfig/20220912-185823-ladsgroup.json
* 18:56 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14] (wcqs): Deploy 0.3.116 to WCQS (duration: 08m 01s)
* 18:48 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14] (wcqs): Deploy 0.3.116 to WCQS
* 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34535 and previous config saved to /var/cache/conftool/dbconfig/20220912-184317-ladsgroup.json
* 18:37 inflatador: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 18:37 inflatador: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 18:37 inflatador: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 18:22 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 07m 31s)
* 18:14 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
* 18:14 dancy@deploy1002: Installation of scap version "4.16.0" completed for 561 hosts
* 18:13 dancy@deploy1002: Installing scap version "4.16.0" for 561 hosts
* 18:08 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 05m 37s)
* 18:02 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
* 18:01 inflatador: [WDQS Deploy] Tests passing following deploy of `wdqs1003` on canary `wdqs1003`; proceeding to rest of fleet
* 17:57 inflatador: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.116`. Pre-deploy tests passing on canary `wdqs1003`
* 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34532 and previous config saved to /var/cache/conftool/dbconfig/20220912-174301-ladsgroup.json
* 17:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 17:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34531 and previous config saved to /var/cache/conftool/dbconfig/20220912-174239-ladsgroup.json
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P34529 and previous config saved to /var/cache/conftool/dbconfig/20220912-172733-ladsgroup.json
* 17:21 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 17:21 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P34528 and previous config saved to /var/cache/conftool/dbconfig/20220912-171227-ladsgroup.json
* 17:08 cwhite: rebuilt raid on logstash2027 [[phab:T316996|T316996]]
* 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34527 and previous config saved to /var/cache/conftool/dbconfig/20220912-165720-ladsgroup.json
* 15:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 15:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34524 and previous config saved to /var/cache/conftool/dbconfig/20220912-152920-ladsgroup.json
* 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34523 and previous config saved to /var/cache/conftool/dbconfig/20220912-152858-ladsgroup.json
* 15:18 dancy@deploy1002: Installation of scap version "4.18.0" completed for 561 hosts
* 15:17 dancy@deploy1002: Installing scap version "4.18.0" for 561 hosts
* 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P34522 and previous config saved to /var/cache/conftool/dbconfig/20220912-151352-ladsgroup.json
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P34521 and previous config saved to /var/cache/conftool/dbconfig/20220912-145845-ladsgroup.json
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34520 and previous config saved to /var/cache/conftool/dbconfig/20220912-144339-ladsgroup.json
* 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34519 and previous config saved to /var/cache/conftool/dbconfig/20220912-141427-ladsgroup.json
* 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34518 and previous config saved to /var/cache/conftool/dbconfig/20220912-141405-ladsgroup.json
* 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
* 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wtp[1028-1030]
* 14:02 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P34517 and previous config saved to /var/cache/conftool/dbconfig/20220912-135859-ladsgroup.json
* 13:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
* 13:53 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp[1028-1030]
* 13:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 13:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P34516 and previous config saved to /var/cache/conftool/dbconfig/20220912-134353-ladsgroup.json
* 13:43 Lucas_WMDE: UTC afternoon backport+config window done
* 13:40 Lucas_WMDE: scap pull on mwdebug1001 to restore good code (confirmed that [[phab:T317520|T317520]] affects production)
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34515 and previous config saved to /var/cache/conftool/dbconfig/20220912-133848-root.json
* 13:35 Lucas_WMDE: manually applying [[gerrit:830691]] on mwdebug1001 to test if [[phab:T317520|T317520]] affects production (expected to cause getExpensiveParserFunctionLimit-related logstash errors)