You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(l10nupdate@tin ResourceLoader cache refresh completed at Sun Sep 6 04:27:57 UTC 2015 (duration 27m 56s) (logmsgbot))
imported>Stashbot
(pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207'])
 
Line 1: Line 1:
== 2015-09-06 ==
== 2023-03-30 ==
* 04:27 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Sep  6 04:27:57 UTC 2015 (duration 27m 56s)
* 00:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
* 02:23 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-06 02:23:08+00:00
* 00:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
* 02:19 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 06m 14s)
* 00:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
* 00:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
* 00:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
* 00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
* 00:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
* 00:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
* 00:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
* 00:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1225.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
* 00:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 00:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
* 00:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 00:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 00:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1225.mgmt.eqiad.wmnet with reboot policy FORCED


== 2015-09-05 ==
== 2023-03-29 ==
* 23:37 Krinkle: mwscript deleteEqualMessages.php --wiki fywiktionary
* 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1224.mgmt.eqiad.wmnet with reboot policy FORCED
* 04:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Sep 5 04:31:34 UTC 2015 (duration 31m 33s)
* 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1223.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:30 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-05 02:30:06+00:00
* 23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on contint2002.wikimedia.org with reason: WIP-known-to-be-debugged-new-host
* 02:27 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 53s)
* 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on contint2002.wikimedia.org with reason: WIP-known-to-be-debugged-new-host
* 23:51 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1224.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1223.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1221.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1222.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:48 mutante: contint2002 - a2dismod mpm_event (ONCE AGAIN this year old issue when applying roles with apache for the first time) - running puppet - now it can actually install PHP 7.3 and start apache  [[phab:T324659|T324659]]
* 23:48 mutante: contint2002 - a2dismod mpm_event (ONCE AGAIN this year old issue when applying roles with apache for the first time) - running puppet - now it can actually install PHP 7.3 and start apache
* 23:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 23:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1222.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1221.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1220.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1219.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1220.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1219.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1218.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1218.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1216.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1215.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1216.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1214.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1215.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1213.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1214.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:13 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1212.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1213.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 22:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 21:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1212.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 21:54 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit1003']
* 21:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
* 21:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit1003']
* 21:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
* 21:45 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@ada9bb0]: disable auto-versioning of glent uploads (duration: 00m 14s)
* 21:45 ebernhardson@deploy2002: Started deploy [airflow-dags/search@ada9bb0]: disable auto-versioning of glent uploads
* 21:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
* 21:15 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
* 20:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
* 20:29 taavi@deploy2002: Finished scap: Backport for [[gerrit:893839{{!}}Add per-action component-level profiling in statsd using excimer (T225968)]] (duration: 11m 52s)
* 20:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
* 20:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 20:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 20:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 20:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 20:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
* 20:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 20:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
* 20:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1073']
* 20:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1073']
* 20:18 taavi@deploy2002: aaron and taavi: Backport for [[gerrit:893839{{!}}Add per-action component-level profiling in statsd using excimer (T225968)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:17 taavi@deploy2002: Started scap: Backport for [[gerrit:893839{{!}}Add per-action component-level profiling in statsd using excimer (T225968)]]
* 20:15 taavi@deploy2002: Finished scap: Backport for [[gerrit:903835{{!}}Update "United States" static page to facilitate synthetic testing of T331681 (T331681)]] (duration: 09m 45s)
* 20:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
* 20:10 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:07 taavi@deploy2002: nray and taavi: Backport for [[gerrit:903835{{!}}Update "United States" static page to facilitate synthetic testing of T331681 (T331681)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:06 volans@cumin1001: START - Cookbook sre.hosts.provision for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:06 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:05 taavi@deploy2002: Started scap: Backport for [[gerrit:903835{{!}}Update "United States" static page to facilitate synthetic testing of T331681 (T331681)]]
* 20:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:50 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
* 19:50 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 19:48 volans@cumin1001: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:20 sukhe: force puppet agent run on A:lvs to additionally confirm nothing broke
* 19:20 sukhe: [enable] puppet on A:lvs to roll out pybal prometheus-client change
* 19:14 sukhe: disable puppet on A:lvs to roll out pybal prometheus-client change
* 18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 18:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1138 [[phab:T333480|T333480]]', diff saved to https://phabricator.wikimedia.org/P45981 and previous config saved to /var/cache/conftool/dbconfig/20230329-185431-ladsgroup.json
* 18:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary [[phab:T333480|T333480]]', diff saved to https://phabricator.wikimedia.org/P45980 and previous config saved to /var/cache/conftool/dbconfig/20230329-185125-ladsgroup.json
* 18:50 Amir1: Starting s4 eqiad failover from db1138 to db1160 - [[phab:T333480|T333480]]
* 18:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:45 dduvall@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.2  refs [[phab:T330208|T330208]] (duration: 05m 48s)
* 18:39 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.2 refs [[phab:T330208|T330208]]
* 18:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:38 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@d66d6e0]: bump glent to 0.3.3 (duration: 00m 16s)
* 18:38 ebernhardson@deploy2002: Started deploy [airflow-dags/search@d66d6e0]: bump glent to 0.3.3
* 18:32 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 18:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 18:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 [[phab:T333480|T333480]]', diff saved to https://phabricator.wikimedia.org/P45979 and previous config saved to /var/cache/conftool/dbconfig/20230329-182536-ladsgroup.json
* 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T333480|T333480]]
* 18:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T333480|T333480]]
* 18:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
* 18:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:16 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.2  refs [[phab:T330208|T330208]]
* 18:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
* 17:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
* 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: PC maint
* 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: PC maint
* 17:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:43 brett: Re-enable puppet on A:cp - [[phab:T284555|T284555]]
* 17:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
* 17:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 17:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 17:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:29 brett: Disable puppet on A:cp to roll out another [[phab:T284555|T284555]]
* 17:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:18 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
* 17:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:11 brett: Re-enable puppet on A:cp - [[phab:T284555|T284555]]
* 16:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
* 16:44 brett: Disable puppet on A:cp to roll out [[phab:T284555|T284555]]
* 16:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
* 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 16:29 btullis@cumin1001: Added views for new wiki: anpwiki [[phab:T332458|T332458]]
* 16:05 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 16:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 16:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:58 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:51 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 15:51 btullis@cumin1001: Added views for new wiki: gucwiki [[phab:T326235|T326235]]
* 15:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 15:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
* 15:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
* 15:29 elukey@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 15:29 elukey@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 15:28 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:27 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:27 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
* 15:27 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:26 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 15:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:07 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2001.codfw.wmnet with reason: Stop kafka, dist-upgrade
* 15:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2001.codfw.wmnet with reason: Stop kafka, dist-upgrade
* 15:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:01 jgleeson: SmashPig upgraded from {{Gerrit|758a34c1}} to {{Gerrit|240c80a2}}
* 15:01 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4a7a6cc]: prefix hive properties with spark.hive. (duration: 00m 13s)
* 15:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4a7a6cc]: prefix hive properties with spark.hive.
* 14:59 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2005-dev
* 14:58 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2005-dev
* 14:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2005-dev
* 14:57 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2005-dev
* 14:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:49 XioNoX: Remove custom BGP graceful-shutdown on all core routers - [[phab:T320230|T320230]]
* 14:47 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:20 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:20 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:18 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:15 Lucas_WMDE: UTC afternoon backport+config window done
* 14:14 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for [[gerrit:904130{{!}}SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)]] (duration: 07m 30s)
* 14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:904130{{!}}SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:07 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for [[gerrit:904130{{!}}SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)]]
* 14:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:04 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for [[gerrit:904129{{!}}SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)]] (duration: 08m 02s)
* 14:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:00 XioNoX: merge/deploy change in Puppet's modules/network/data/data.yaml - [[phab:T327930|T327930]]
* 13:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:58 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:904129{{!}}SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:56 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for [[gerrit:904129{{!}}SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)]]
* 13:56 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:54 jgiannelos@deploy2002: Finished deploy [restbase/deploy@0d2f12f]: (no justification provided) (duration: 17m 59s)
* 13:54 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 13:51 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 13:46 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:42 elukey: run dist-upgrade on kafka-main2002 to upgrade it to bullseye - [[phab:T332013|T332013]]
* 13:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka, dist-upgrade
* 13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka, dist-upgrade
* 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript cleanupTitles.php gurwiki # [[phab:T332241|T332241]] (2 of 767 rows updated)
* 13:37 sukhe: enable puppet on A:lvs to test Python 2 deprecation change: [[phab:T321309|T321309]]
* 13:36 jgiannelos@deploy2002: Started deploy [restbase/deploy@0d2f12f]: (no justification provided)
* 13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes.php gurwiki --fix # [[phab:T332241|T332241]] – 0 pages to fix (0 resolvable), 0 links to fix (0 resolvable, 0 deleted)
* 13:30 XioNoX: enable vcp-snmp-statistics on fasw-c-codfw
* 13:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for [[gerrit:889257{{!}}Enabled native gallery editing in Parsoid (T329662)]] (duration: 10m 19s)
* 13:29 sukhe: disable puppet on A:lvs to test Python 2 deprecation change: [[phab:T321309|T321309]]
* 13:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and arlolra: Backport for [[gerrit:889257{{!}}Enabled native gallery editing in Parsoid (T329662)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:19 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for [[gerrit:889257{{!}}Enabled native gallery editing in Parsoid (T329662)]]
* 13:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for [[gerrit:903780{{!}}Enable history page visual diffs on remaining wikis (T314588)]] (duration: 08m 23s)
* 13:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:10 dcausse@deploy2002: Finished deploy [airflow-dags/search@92e9876]: (no justification provided) (duration: 00m 14s)
* 13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for [[gerrit:903780{{!}}Enable history page visual diffs on remaining wikis (T314588)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:09 dcausse@deploy2002: Started deploy [airflow-dags/search@92e9876]: (no justification provided)
* 13:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for [[gerrit:903780{{!}}Enable history page visual diffs on remaining wikis (T314588)]]
* 13:01 XioNoX: test enabling lldp on mr1-ulsfo
* 12:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:55 XioNoX: test enabling lldp on pfw3-codfw
* 12:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 12:43 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 12:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 12:22 btullis@cumin1001: Added views for new wiki: gurwiki [[phab:T327841|T327841]]
* 11:57 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 11:55 btullis@cumin1001: Added views for new wiki: shnwikivoyage [[phab:T302798|T302798]]
* 11:55 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 11:54 btullis@cumin1001: Added views for new wiki: guwwiktionary [[phab:T309056|T309056]]
* 11:54 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 11:53 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 11:53 btullis@cumin1001: Added views for new wiki: guwwiki [[phab:T303761|T303761]]
* 11:53 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 11:51 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 11:51 btullis@cumin1001: Added views for new wiki: kcgwiki [[phab:T305280|T305280]]
* 11:51 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 11:18 jgiannelos@deploy2002: deploy aborted: (no justification provided) (duration: 00m 01s)
* 11:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@c265f3f] (beta): (no justification provided)
* 11:12 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing GraphQL - jbond@cumin2002"
* 11:07 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing GraphQL - jbond@cumin2002"
* 10:58 claime: authdns-update successful on all nodes - [[phab:T333120|T333120]]
* 10:57 claime: Running authdns-update
* 10:55 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int,name=codfw
* 10:55 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro
* 10:52 claime: Running puppet on dns-auth - [[phab:T333120|T333120]]
* 10:50 claime: Switching mw-api-int to production - [[phab:T333120|T333120]]
* 10:50 claime: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P<nowiki>{</nowiki>lvs1019*,lvs2009*<nowiki>}</nowiki> and A:lvs ([[phab:T333120|T333120]])
* 10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P<nowiki>{</nowiki>lvs1019*,lvs2009*<nowiki>}</nowiki> and A:lvs ([[phab:T333120|T333120]])
* 10:46 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P<nowiki>{</nowiki>lvs1019*,lvs2009*<nowiki>}</nowiki> and A:lvs ([[phab:T333120|T333120]])
* 10:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P<nowiki>{</nowiki>lvs1020*,lvs2010*<nowiki>}</nowiki> and A:lvs ([[phab:T333120|T333120]])
* 10:41 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P<nowiki>{</nowiki>lvs1020*,lvs2010*<nowiki>}</nowiki> and A:lvs ([[phab:T333120|T333120]])
* 10:37 claime: Switching mw-api-int to lvs_setup - [[phab:T333120|T333120]]
* 10:21 hnowlan@deploy2002: Finished deploy [restbase/deploy@c265f3f]: Add ckbwiktionary, anpwiki [[phab:T332093|T332093]] [[phab:T332379|T332379]] (duration: 19m 30s)
* 10:02 hnowlan@deploy2002: Started deploy [restbase/deploy@c265f3f]: Add ckbwiktionary, anpwiki [[phab:T332093|T332093]] [[phab:T332379|T332379]]
* 09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:58 claime: running puppet on O:kubernetes::worker and O:lvs::balancer - [[phab:T333120|T333120]]
* 09:58 denisse: updating prometheus3001 to bullseye
* 09:57 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:57 claime: Adding mw-api-int to service_catalog in service_setup - [[phab:T333120|T333120]]
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:54 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:50 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:50 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
* 09:33 filippo@deploy2002: Finished scap: Backport for [[gerrit:904076{{!}}Revert "Failover statsd to graphite2004"]] (duration: 07m 34s)
* 09:27 filippo@deploy2002: filippo: Backport for [[gerrit:904076{{!}}Revert "Failover statsd to graphite2004"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 09:26 filippo@deploy2002: Started scap: Backport for [[gerrit:904076{{!}}Revert "Failover statsd to graphite2004"]]
* 09:02 elukey: move kafka on kafka-jumbo1001 to PKI TLS certs - [[phab:T296064|T296064]]
* 09:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: restart kafka, upgrade to PKI
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: restart kafka, upgrade to PKI
* 08:03 volans: installed spicerack v6.4.0 on cumin1001
* 07:37 kartik@deploy2002: Finished scap: Backport for [[gerrit:903808{{!}}CX3 Build 0.2.0+20230329 (T333128 T328533 T317995)]] (duration: 12m 35s)
* 07:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2003.codfw.wmnet with reason: Stop kafka, dist-upgrade
* 07:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2003.codfw.wmnet with reason: Stop kafka, dist-upgrade
* 07:31 oblivian@deploy2002: Finished deploy [restbase/deploy@11477d6]: Updating stale nodes, [[phab:T333069|T333069]] (duration: 32m 07s)
* 07:27 volans: installed spicerack v6.4.0 on cumin2002
* 07:26 kartik@deploy2002: kartik: Backport for [[gerrit:903808{{!}}CX3 Build 0.2.0+20230329 (T333128 T328533 T317995)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:25 kartik@deploy2002: Started scap: Backport for [[gerrit:903808{{!}}CX3 Build 0.2.0+20230329 (T333128 T328533 T317995)]]
* 07:07 slyngs: Update Squid logformat (urldownloader[1001-1002,2001-2002,2004].wikimedia.org)
* 06:59 oblivian@deploy2002: Started deploy [restbase/deploy@11477d6]: Updating stale nodes, [[phab:T333069|T333069]]
* 06:47 hashar: Restarted Gerrit
* 06:43 hashar@deploy2002: Finished deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins (duration: 00m 10s)
* 06:43 hashar@deploy2002: Started deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins
* 06:42 hashar: gerrit2002: restarted Gerrit replica instance
* 06:40 hashar@deploy2002: Finished deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins (duration: 00m 06s)
* 06:40 hashar@deploy2002: Started deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins
* 06:38 phedenskog@deploy2002: Finished deploy [performance/navtiming@f6c9fa3]: (no justification provided) (duration: 00m 05s)
* 06:38 phedenskog@deploy2002: Started deploy [performance/navtiming@f6c9fa3]: (no justification provided)
* 06:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
* 06:21 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
* 00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
* 00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
* 00:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2035.codfw.wmnet
* 00:37 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp2035.codfw.wmnet
* 00:30 sukhe: restart pybal on lvs1018 to hopefully resolve flapping BGP session
* 00:06 zabe@deploy2002: Finished scap: Backport for [[gerrit:903803{{!}}throttle: Remove expired throttle]] (duration: 07m 19s)
* 00:00 zabe@deploy2002: zabe: Backport for [[gerrit:903803{{!}}throttle: Remove expired throttle]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet


== 2015-09-04 ==
== 2023-03-28 ==
* 23:52 logmsgbot: mattflaschen@tin Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change (duration: 00m 12s)
* 23:59 zabe@deploy2002: Started scap: Backport for [[gerrit:903803{{!}}throttle: Remove expired throttle]]
* 23:52 logmsgbot: mattflaschen@tin Synchronized wmf-config/CommonSettings-labs.php: Beta-only change (duration: 00m 11s)
* 23:46 zabe@deploy2002: Finished scap: [[phab:T331831|T331831]] (duration: 06m 50s)
* 22:49 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/Citoid: https://gerrit.wikimedia.org/r/#/c/236218/ and https://gerrit.wikimedia.org/r/#/c/236222/ (duration: 00m 12s)
* 23:39 zabe@deploy2002: Started scap: [[phab:T331831|T331831]]
* 21:55 urandom: bouncing Cassandra on restbase1001 to restore default GC settings
* 23:34 zabe@deploy2002: Finished scap: [[phab:T331831|T331831]] (duration: 07m 01s)
* 18:36 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/ukwikivoyage.png: https://gerrit.wikimedia.org/r/#/c/236063/ (duration: 00m 11s)
* 23:27 zabe@deploy2002: Started scap: [[phab:T331831|T331831]]
* 18:06 logmsgbot: krinkle@tin Synchronized php-1.26wmf21/extensions/WikimediaEvents/modules/ext.wikimediaEvents.statsd.js: Ib98988f67ef (duration: 00m 11s)
* 23:27 zabe: central Kurdish Wiktionary (ckbwiktionary)
* 17:35 MaxSem: Maps: dropped duplicate index on water_polygons
* 22:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:27 jynus: cloning es1 mysql data from es1004 to es1018 [ETA:16h]
* 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:11 paravoid: updating firewall border ACLs and BGP border filters across all cr
* 22:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:42 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1002, es1016; Depool es1004 (duration: 00m 11s)
* 22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:35 godog: python varnishlog collector + gdb running on cp1052 for debugging T83580
* 22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002"
* 12:55 moritzm: restarted salt-master on palladium
* 22:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002"
* 12:47 moritzm: uploaded debdeploy 0.0.4 to carbon
* 22:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 10:18 logmsgbot: kartik@tin Synchronized php-1.26wmf21/extensions/ContentTranslation/api/ApiContentTranslationPublish.php: php-1.26wmf21/extensions/ContentTranslation/extension.json T111490:Use the VirtualRESTService to configure CX (duration: 00m 12s)
* 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-fr-ca_1.0.3~r61329-1
* 21:44 eileen: civicrm upgraded from {{Gerrit|db3b727e}} to {{Gerrit|183d131d}}
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-fr_0.9.0~r28336-1
* 21:23 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments (duration: 00m 13s)
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-es_0.9.1~r60655-1
* 21:22 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-ca_0.9.1~r60655-1
* 21:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-ca-it_0.1.1~r57554-1
* 21:06 urandom: updating image_suggestions default table TTL(s) from {{Gerrit|1209600}} to {{Gerrit|1814400}} (seconds) — [[phab:T333319|T333319]]
* 07:50 jynus: cloning es3 mysql data from es1008 to es1019
* 21:05 phedenskog@deploy2002: Finished deploy [performance/navtiming@4d22874]: (no justification provided) (duration: 00m 06s)
* 04:19 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Sep  4 04:19:20 UTC 2015 (duration 19m 19s)
* 21:05 phedenskog@deploy2002: Started deploy [performance/navtiming@4d22874]: (no justification provided)
* 02:26 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-04 02:26:04+00:00
* 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:23 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 21s)
* 21:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:56 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: T111439 (duration: 00m 12s)
* 21:03 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:903684{{!}}Only run edit check on main namespace]], [[gerrit:903685{{!}}Change name of the editcheck-needreference tag to editcheck-references]], [[gerrit:903759{{!}}Enable hidden tag for "Edit Check" project on Wikipedias (T324733)]] (duration: 28m 53s)
* 00:11 logmsgbot: krinkle@tin Synchronized php-1.26wmf21/includes/resourceloader/ResourceLoader.php: I24f68e34a9fa4918 (duration: 00m 12s)
* 20:51 urbanecm@deploy2002: urbanecm and matmarex: Backport for [[gerrit:903684{{!}}Only run edit check on main namespace]], [[gerrit:903685{{!}}Change name of the editcheck-needreference tag to editcheck-references]], [[gerrit:903759{{!}}Enable hidden tag for "Edit Check" project on Wikipedias (T324733)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:06 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235940/ (duration: 00m 11s)
* 20:34 urbanecm@deploy2002: Started scap: Backport for [[gerrit:903684{{!}}Only run edit check on main namespace]], [[gerrit:903685{{!}}Change name of the editcheck-needreference tag to editcheck-references]], [[gerrit:903759{{!}}Enable hidden tag for "Edit Check" project on Wikipedias (T324733)]]
* 20:27 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes (duration: 00m 13s)
* 20:27 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes
* 20:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 20:09 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 20:02 ejegg: payments-wiki upgraded from {{Gerrit|f5ec2677}} to {{Gerrit|b5df483f}}
* 19:29 dduvall@deploy2002: Pruned MediaWiki: 1.40.0-wmf.27 (duration: 02m 11s)
* 19:26 dduvall@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2  refs [[phab:T330208|T330208]] (duration: 07m 24s)
* 19:19 dduvall@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2  refs [[phab:T330208|T330208]]
* 18:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:37 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance (duration: 00m 20s)
* 18:36 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance
* 18:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002"
* 18:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002"
* 18:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=ats-be
* 16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=cdn
* 16:52 volans: uploaded spicerack_6.4.0 to apt.wikimedia.org bullseye-wikimedia (but I'll deploy it to the cumin hosts tomorrow)
* 16:10 jnuche@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2  refs [[phab:T330208|T330208]] (duration: 49m 52s)
* 16:09 bblack: reboot cp1082 (NIC issues)
* 16:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=ats-be
* 16:03 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=cdn
* 16:00 inflatador: bking@cumin1001 unban elastic and cloudelastic nodes post maintenance [[phab:T330165|T330165]]
* 15:57 btullis@deploy2002: Finished deploy [analytics/refinery@6554ec0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6554ec0] (duration: 01m 32s)
* 15:20 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2  refs [[phab:T330208|T330208]]
* 15:15 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 15:15 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 15:14 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 15:08 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 15:07 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 15:05 jnuche@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.41.0-wmf.2" --no-progress --store-class=LCStoreCDB --threads=30 --lang en  --quiet ' returned non-zero exit status 1. (duration: 00m 03s)
* 15:05 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2  refs [[phab:T330208|T330208]]
* 14:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
* 14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=eqiad
* 14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=device-analytics,name=pki
* 14:53 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=device-analytics,name=eqiad
* 14:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=device-analytics
* 14:51 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: eqiad row B switches upgrade done - [[phab:T330165|T330165]]
* 14:48 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
* 14:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor100[12].eqiad.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
* 14:32 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: eqiad row B switches upgrade done - [[phab:T330165|T330165]]
* 14:31 sukhe: run authdns-update to revert eqiad depool
* 14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
* 14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=THANOS-FE-OLD-FQDN,service=thanos-web
* 14:05 XioNoX: reboot eqiad row B for upgrade - [[phab:T330165|T330165]]
* 13:58 godog: depool thanos-fe1002 - [[phab:T330165|T330165]]
* 13:54 Emperor: depool ms-fe1010 before switch work [[phab:T330165|T330165]]
* 13:53 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
* 13:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 249 hosts with reason: eqiad row B upgrade
* 13:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
* 13:47 akosiaris: depool swift in eqiad for row B upgrade
* 13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
* 13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 13:46 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 249 hosts with reason: eqiad row B upgrade
* 13:45 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:45 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:44 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 13:41 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 13:36 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:34 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=thumbor,name=eqiad
* 13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1002.eqiad.wmnet
* 13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 13:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row B switches upgrade - [[phab:T330165|T330165]]
* 12:59 XioNoX: depool eqiad for network maintenance - [[phab:T330165|T330165]]
* 12:58 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row B switches upgrade - [[phab:T330165|T330165]]
* 12:57 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:56 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:56 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 12:56 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
* 12:44 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
* 12:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
* 12:43 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
* 12:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
* 12:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
* 12:36 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict1002.eqiad.wmnet with OS bullseye
* 12:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
* 12:34 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
* 12:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage
* 12:21 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage
* 12:20 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:20 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 12:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45295
* 12:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 45295
* 12:09 eoghan@cumin1001: START - Cookbook sre.ganeti.reimage for host aphlict1002.eqiad.wmnet with OS bullseye
* 11:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade
* 11:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade
* 11:56 elukey: dist-upgrade kafka-main1002 to debian bullseye - [[phab:T332013|T332013]]
* 11:51 ladsgroup@deploy2002: Finished scap: Backport for [[gerrit:903549{{!}}api: Mark query as read-only to avoid regex on SQL (T332942)]] (duration: 18m 42s)
* 11:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:37 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:34 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:34 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:903549{{!}}api: Mark query as read-only to avoid regex on SQL (T332942)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:32 ladsgroup@deploy2002: Started scap: Backport for [[gerrit:903549{{!}}api: Mark query as read-only to avoid regex on SQL (T332942)]]
* 11:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:23 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 11:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 11:22 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:21 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 11:00 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 10:24 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:16 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 10:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 09:56 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues
* 09:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues
* 09:41 vgutierrez: resetting cp2035 management card - [[phab:T333312|T333312]]
* 09:38 elukey: dist-upgrade kafka-main1001 to bullseye - [[phab:T332013|T332013]]
* 09:36 godog: silence systemdunitfailed alerts for team=wmcs - [[phab:T333315|T333315]]
* 09:35 vgutierrez: depool cp2035 - [[phab:T333312|T333312]]
* 09:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade
* 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade
* 09:12 jbond@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts
* 09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts
* 09:11 jbond@cumin1001: END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts
* 09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts
* 08:58 vgutierrez: restart ipmiseld on cp2035
* 08:50 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org
* 08:49 ayounsi@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:48 AndyRussG: update payments.wiki config {{Gerrit|65bedd4a}} -> {{Gerrit|e31ffd7d}}, payments (automatic updates only) {{Gerrit|a6c6c2b1}} -> {{Gerrit|f5ec2677}}
* 08:45 ayounsi@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:43 ayounsi@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:42 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
* 08:39 ayounsi@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:37 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 08:35 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 08:34 ayounsi@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 08:32 ayounsi@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 08:32 phedenskog@deploy2002: Finished deploy [performance/navtiming@e757bdf]: (no justification provided) (duration: 00m 06s)
* 08:32 phedenskog@deploy2002: Started deploy [performance/navtiming@e757bdf]: (no justification provided)
* 08:31 ayounsi@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 08:29 ayounsi@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 08:25 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 08:21 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 08:14 ayounsi@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:11 oblivian@deploy2002: Finished scap: Backport for [[gerrit:903209{{!}}Failover statsd to graphite2004 (T330165)]] (duration: 08m 48s)
* 08:08 ayounsi@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 08:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 16 hosts with reason: Switch maintenance
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 16 hosts with reason: Switch maintenance
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 21 hosts with reason: Switch maintenance
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 21 hosts with reason: Switch maintenance
* 08:04 oblivian@deploy2002: oblivian and filippo: Backport for [[gerrit:903209{{!}}Failover statsd to graphite2004 (T330165)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance
* 08:03 ayounsi@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance
* 08:02 oblivian@deploy2002: Started scap: Backport for [[gerrit:903209{{!}}Failover statsd to graphite2004 (T330165)]]
* 08:02 ayounsi@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:00 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:00 godog: move graphite reads to codfw - [[phab:T330165|T330165]]
* 07:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 07:56 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:56 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:54 root@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 07:54 root@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:51 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 07:51 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45965 and previous config saved to /var/cache/conftool/dbconfig/20230328-073122-root.json
* 07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 17806
* 07:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 17806
* 07:20 kartik@deploy2002: Finished scap: Backport for [[gerrit:903003{{!}}Enable Section Translation on some wikis while Content Translation remains in beta (T308834)]] (duration: 12m 05s)
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45964 and previous config saved to /var/cache/conftool/dbconfig/20230328-071617-root.json
* 07:10 kartik@deploy2002: kartik: Backport for [[gerrit:903003{{!}}Enable Section Translation on some wikis while Content Translation remains in beta (T308834)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:08 kartik@deploy2002: Started scap: Backport for [[gerrit:903003{{!}}Enable Section Translation on some wikis while Content Translation remains in beta (T308834)]]
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45963 and previous config saved to /var/cache/conftool/dbconfig/20230328-070112-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45962 and previous config saved to /var/cache/conftool/dbconfig/20230328-064607-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45961 and previous config saved to /var/cache/conftool/dbconfig/20230328-063103-root.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45960 and previous config saved to /var/cache/conftool/dbconfig/20230328-061558-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 [[phab:T329481|T329481]]', diff saved to https://phabricator.wikimedia.org/P45959 and previous config saved to /var/cache/conftool/dbconfig/20230328-061441-root.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P45958 and previous config saved to /var/cache/conftool/dbconfig/20230328-060053-root.json
* 05:55 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 05:55 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 05:53 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 05:53 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 05:47 AndyRussG: update payments-wiki {{Gerrit|f5e262d1}} -> {{Gerrit|a6c6c2b1}}
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P45957 and previous config saved to /var/cache/conftool/dbconfig/20230328-054548-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P45956 and previous config saved to /var/cache/conftool/dbconfig/20230328-053043-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45955 and previous config saved to /var/cache/conftool/dbconfig/20230328-051539-root.json
* 01:59 krinkle@deploy2002: Synchronized wmf-config/mc.php: {{Gerrit|I44edcd46da45b827d}} (duration: 06m 33s)


== 2015-09-03 ==
== 2023-03-27 ==
* 23:53 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/235853/ (duration: 00m 12s)
* 23:47 mutante: people1003 - taking down apache to provoke monitoring alert (inactive instances) and confirm IRC alerting change works
* 23:51 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235843/ (duration: 00m 12s)
* 23:31 zabe: deployed patch for [[phab:T330968|T330968]]
* 23:50 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: https://gerrit.wikimedia.org/r/#/c/235843/ (duration: 00m 12s)
* 23:08 zabe@deploy2002: Finished scap: Backport for [[gerrit:903205{{!}}Rename "Support and Safety" to "Trust and Safety" (T330514)]] (duration: 21m 27s)
* 23:41 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235850/ (duration: 00m 12s)
* 23:00 zabe@deploy2002: zabe: Backport for [[gerrit:903205{{!}}Rename "Support and Safety" to "Trust and Safety" (T330514)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 23:40 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/ukwikivoyage.png: https://gerrit.wikimedia.org/r/#/c/235850/ (duration: 00m 12s)
* 22:48 mutante: stat1005 - kill 18179; run puppet ; stat1007 - kill 3346; run puppet ; stat1006 - kill 23887 run puppet
* 23:37 mutante: mw1224 - killed and restarted defunct hhvm, version is different from the one on mw1225
* 22:47 zabe@deploy2002: Started scap: Backport for [[gerrit:903205{{!}}Rename "Support and Safety" to "Trust and Safety" (T330514)]]
* 23:37 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/235728 (duration: 00m 13s)
* 22:43 mutante: stat1004 - kill 29291; run puppet
* 23:36 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/knwikisource.png: https://gerrit.wikimedia.org/r/#/c/235728/ (duration: 00m 12s)
* 22:43 mutante: apt2001 - kill 3105; run puppet
* 23:32 Krenair: mw1224 has been sending segfault warnings and "Lost parent, LightProcess exiting" to hhvm.log since about 21:17:34
* 22:16 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Meta:WMF Support and Safety" "Meta:WMF Trust and Safety" "Zabe" --reason "per [[:phab:T330514{{!}}T330514]]" # [[phab:T330514|T330514]]
* 23:29 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/CirrusSearch: https://gerrit.wikimedia.org/r/#/c/235905/ (duration: 00m 13s)
* 21:58 maryum: Deploy security fix for [[phab:T326952|T326952]]
* 23:28 logmsgbot: krenair@tin Synchronized php-1.26wmf21/package.json: bd2eb6cc1919c7dab056d5f8fe5b4a164236d78f (duration: 00m 13s)
* 21:58 urandom: power cycling restbase1033 — [[phab:T333243|T333243]]
* 23:02 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235908/ (duration: 00m 13s)
* 21:45 ryankemper: [[phab:T330165|T330165]] Depooled relevant search platform hosts: `sudo -E cumin 'elastic[1055-1056,1074-1079,1085-1086]*,cloudelastic100[2,6]*,wcqs1002*,wdqs[1007,1012]*' 'sudo depool'`
* 21:21 ori: rebuilt HHVM with updated diff from facebook/hhvm PR #6071 (T109540), uploaded to apt as 3.6.5+dfsg1-1+wm5
* 21:24 Amir1: start of watchlist clean up in arwiki ([[phab:T328501|T328501]])
* 21:18 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 21:23 kindrobot: finish UTC late backports
* 19:54 bearND: MobileApps deployed sha1 553c399
* 21:22 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:903326{{!}}Disable VisualEditor from talk namespace]], [[gerrit:903323{{!}}[sysop_itwiki] Add the logo also for vector 2022 (T330279)]] (duration: 08m 26s)
* 19:31 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf21
* 21:15 kindrobot@deploy2002: kindrobot and superpes: Backport for [[gerrit:903326{{!}}Disable VisualEditor from talk namespace]], [[gerrit:903323{{!}}[sysop_itwiki] Add the logo also for vector 2022 (T330279)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 18:13 ottomata: rolling restart of hadoop  yarn nodemanagers to pick up Yarn AppMaster port range limitation to apply ferm rules.
* 21:15 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided) (duration: 00m 13s)
* 18:04 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Add plumbing code for Flow beta feature (unused for now) (duration: 00m 12s)
* 21:14 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided)
* 18:03 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add plumbing code for Flow beta feature (unused for now) (duration: 00m 12s)
* 21:14 kindrobot@deploy2002: Started scap: Backport for [[gerrit:903326{{!}}Disable VisualEditor from talk namespace]], [[gerrit:903323{{!}}[sysop_itwiki] Add the logo also for vector 2022 (T330279)]]
* 17:39 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235769/ (duration: 00m 12s)
* 21:11 tzatziki: moving Universal Code of Conduct/Enforcement guidelines -> Universal Code of Conduct/Enforcement guidelines/Version 1 on metawiki with `extensions/Translate/scripts/moveTranslatableBundle.php `
* 17:34 mutante: bromine - deleting policy docroot
* 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1022.eqiad.wmnet
* 17:06 jynus: cloning es1006 mysql data into es1015 [ETA:8h]
* 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:30 bblack: updating nginx->1.9.4 on cp1071, cp3033 for prod validation before broader rollout
* 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 16:30 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: es3 master switchover from es1009 to es1014 (eqiad) (duration: 00m 13s)
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 16:28 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: es3 master switchover from es1009 to es1014 (codfw) (duration: 00m 13s)
* 20:41 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 16:26 mutante: imported jenkins 1.609.3 into APT repo
* 20:36 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1022.eqiad.wmnet
* 16:23 legoktm: fixed content model of Template:Languages@metawiki
* 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1021.eqiad.wmnet
* 16:21 robh: re-enabling puppet on all mw systems
* 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 robh: disabling puppet on all mw systems for apache config update
* 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 16:01 jynus: performing es3 master switchover from es1009 to es1014
* 20:33 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 15:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: depool es1006 (duration: 00m 12s)
* 20:31 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 15:17 hashar: stopping nodepool on labnodepool1001.eqiad.wmnet not ready yet
* 20:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1021.eqiad.wmnet
* 15:15 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: es2 master switchover from es1006 to es1011 (eqiad) (duration: 00m 13s)
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1017.eqiad.wmnet
* 15:14 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: es2 master switchover from es1006 to es1011 (codfw) (duration: 00m 12s)
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 15:04 logmsgbot: demon@tin Synchronized php-1.26wmf21/extensions/Translate/: (no message) (duration: 00m 15s)
* 20:23 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 14:51 jynus: performing es2 master switchover from es1006 to es1011
* 20:21 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:33 paravoid: rebooting msw1-eqiad
* 20:20 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:903322{{!}}Expand list of wikis with language button at top. (T331777)]], [[gerrit:902197{{!}}Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)]] (duration: 10m 50s)
* 14:28 twentyafterfour: restarted phd (phabricator daemon) to pick up new configuration
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1017.eqiad.wmnet
* 14:25 paravoid: changing IPv6 RA interval/lifetime/virtual-router-only @ eqiad
* 20:11 kindrobot@deploy2002: jdlrobson and kindrobot: Backport for [[gerrit:903322{{!}}Expand list of wikis with language button at top. (T331777)]], [[gerrit:902197{{!}}Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:21 paravoid: rebooting msw1-codfw
* 20:10 kindrobot@deploy2002: Started scap: Backport for [[gerrit:903322{{!}}Expand list of wikis with language button at top. (T331777)]], [[gerrit:902197{{!}}Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)]]
* 13:17 paravoid: upgrading mr1-esams and mr1-eqiad to newer junos
* 20:01 kindrobot: start UTC late backport window
* 13:13 godog: bounce carbon daemons on graphite1001
* 19:21 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2 (duration: 00m 14s)
* 12:42 chasemp: unban elastic1001 and put back in service
* 19:21 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2
* 12:24 chasemp: move all shards off of elastic1001
* 19:06 jgleeson: civicrm upgraded from {{Gerrit|09373b9d}} to {{Gerrit|db3b727e}}
* 12:24 chasemp: disable elastic1001 in lvs as we are gonig to try fw apply round #2
* 16:40 akosiaris@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 11:02 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1028; increase the load of es1010, es1013 and es1017 (duration: 00m 12s)
* 16:40 akosiaris@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 10:45 jynus: applying schema change for ContentTranslation on x1-master "wikishared"
* 16:39 akosiaris@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:02 godog: reenable puppet on ms-be1*
* 16:39 akosiaris@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:16 jynus: started profiling mysql queries at phabricator. Only a 1% overhead is expected.
* 16:34 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:12 moritzm: updated rsyncd firewall rules (see https://gerrit.wikimedia.org/r/235425 for details)
* 16:34 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:12 godog: stop puppet on ms-be1* after ferm rsync change
* 16:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:23 godog: fixup current graphite retention T96662
* 16:33 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 07:26 moritzm: enabled ferm on dbstore* servers in codfw
* 16:25 jgleeson: payments-wiki upgraded from {{Gerrit|36366f64}} to {{Gerrit|f5e262d1}}
* 06:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Sep  3 06:29:35 UTC 2015 (duration 29m 34s)
* 15:55 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided) (duration: 00m 11s)
* 03:09 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-03 03:09:20+00:00
* 15:54 ebysans@deploy2002: Started deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided)
* 03:06 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 32s)
* 15:20 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 02:45 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-03 02:45:36+00:00
* 15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 02:39 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 10m 41s)
* 15:20 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 01:32 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 15:19 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 00:36 logmsgbot: ori@tin Synchronized php-1.26wmf21/includes/parser/Preprocessor_Hash.php: Idd1acd903: Decline to cache preprocessor items larger than 1 Mb (duration: 00m 11s)
* 15:17 elukey@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 10s)
* 00:36 logmsgbot: ori@tin Synchronized php-1.26wmf20/includes/parser/Preprocessor_Hash.php: Idd1acd903: Decline to cache preprocessor items larger than 1 Mb (duration: 00m 13s)
* 15:05 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict1002.eqiad.wmnet
* 00:27 RoanKattouw: Deployed patch for T111029
* 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict1002.eqiad.wmnet on all recursors
* 14:56 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache aphlict1002.eqiad.wmnet on all recursors
* 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001"
* 14:55 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001"
* 14:52 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 14:52 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host aphlict1002.eqiad.wmnet
* 14:48 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:48 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:47 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:47 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:46 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 14:45 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:45 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:44 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:44 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:43 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 14:40 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 14:29 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:29 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:29 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 14:28 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 14:27 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:16 taavi: taavi@mwmaint2002 ~ $ mwscript namespaceDupes.php --wiki=huwiki  --fix # [[phab:T333083|T333083]]
* 14:15 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:15 taavi@deploy2002: Finished scap: Backport for [[gerrit:903194{{!}}namespaceDupes: Remove extra addQuotes() calls (T333166)]] (duration: 08m 27s)
* 14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:14 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:08 taavi@deploy2002: taavi: Backport for [[gerrit:903194{{!}}namespaceDupes: Remove extra addQuotes() calls (T333166)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:06 taavi@deploy2002: Started scap: Backport for [[gerrit:903194{{!}}namespaceDupes: Remove extra addQuotes() calls (T333166)]]
* 13:35 fab@deploy2002: Finished deploy [airflow-dags/research@d2c115d]: (no justification provided) (duration: 00m 21s)
* 13:35 fab@deploy2002: Started deploy [airflow-dags/research@d2c115d]: (no justification provided)
* 13:12 taavi@deploy2002: Finished scap: Backport for [[gerrit:902888{{!}}[huwiki] Add Draft and Draft_talk namespaces (T333083)]] (duration: 08m 45s)
* 13:04 taavi@deploy2002: superpes and taavi: Backport for [[gerrit:902888{{!}}[huwiki] Add Draft and Draft_talk namespaces (T333083)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:03 taavi@deploy2002: Started scap: Backport for [[gerrit:902888{{!}}[huwiki] Add Draft and Draft_talk namespaces (T333083)]]
* 12:42 godog: flip alert* to overlay2 - [[phab:T329939|T329939]]
* 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 10:31 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 10:30 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:28 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 10:28 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 10:10 elukey: dist-upgrade kafka-main1003 manually to bullseye - [[phab:T332013|T332013]]
* 10:03 Emperor: depool ms-fe2009
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade
* 09:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade
* 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45295
* 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45295
* 09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:39 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001"
* 08:57 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001"
* 08:55 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 08:47 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 08:39 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:903186{{!}}EntityUsageTable: Mark query as read-only (T332941)]] (duration: 18m 15s)
* 08:30 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:903186{{!}}EntityUsageTable: Mark query as read-only (T332941)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:28 jynus: restarting bacula at backup1001 [[phab:T331510|T331510]]
* 08:25 urbanecm@deploy2002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|63dd23b5ceaba35c8d9682493dd21d99a20fc8f7}}: [Growth] eswiki: Enable mentorship for 50% of newcomers ([[phab:T332737|T332737]], [[phab:T285235|T285235]]) (duration: 06m 09s)
* 08:21 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:903186{{!}}EntityUsageTable: Mark query as read-only (T332941)]]
* 08:18 urbanecm@deploy2002: Backport cancelled.
* 08:06 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:902734{{!}}GrowthMentors.json: Add a write-only username field (T331444)]] (duration: 07m 52s)
* 08:03 marostegui: Failover m1 from db1164 to db1101 - [[phab:T331510|T331510]]
* 08:00 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:902734{{!}}GrowthMentors.json: Add a write-only username field (T331444)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:58 urbanecm@deploy2002: Started scap: Backport for [[gerrit:902734{{!}}GrowthMentors.json: Add a write-only username field (T331444)]]
* 07:55 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:902741{{!}}SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)]] (duration: 16m 45s)
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45949 and previous config saved to /var/cache/conftool/dbconfig/20230327-075206-root.json
* 07:48 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:902741{{!}}SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:39 jynus: disabling puppet and shutding down bacula at backup1001 [[phab:T331510|T331510]]
* 07:38 urbanecm@deploy2002: Started scap: Backport for [[gerrit:902741{{!}}SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45948 and previous config saved to /var/cache/conftool/dbconfig/20230327-073701-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45947 and previous config saved to /var/cache/conftool/dbconfig/20230327-072156-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45946 and previous config saved to /var/cache/conftool/dbconfig/20230327-070651-root.json
* 06:51 marostegui: dbmaint s3 eqiad Rename flaggedrevs tables on db1123 ptwikisource [[phab:T332594|T332594]]
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45945 and previous config saved to /var/cache/conftool/dbconfig/20230327-065147-root.json
* 06:40 marostegui: Rename flaggedrevs tables on db1123 ptwikisource [[phab:T332594|T332594]]
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45944 and previous config saved to /var/cache/conftool/dbconfig/20230327-063642-root.json
* 05:40 kart_: Updated cxserver to 2023-03-17-133444-production ([[phab:T332379|T332379]] + build changes)
* 05:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:37 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:24 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:23 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 [[phab:T332292|T332292]]', diff saved to https://phabricator.wikimedia.org/P45942 and previous config saved to /var/cache/conftool/dbconfig/20230327-051941-root.json
* 05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch [[phab:T331510|T331510]]
* 05:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch [[phab:T331510|T331510]]


== 2015-09-02 ==
== 2023-03-25 ==
* 23:58 logmsgbot: andyrussg@tin Synchronized php-1.26wmf20/extensions/CentralNotice/: CentralNotice update (duration: 00m 13s)
* 07:54 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s)
* 23:33 logmsgbot: andyrussg@tin Synchronized php-1.26wmf21/extensions/CentralNotice/: Update CentralNotice (duration: 00m 13s)
* 07:54 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0
* 23:02 logmsgbot: andyrussg@tin Finished scap: Update CentralNotice to 2.6.0 for wmf21 (duration: 48m 18s)
* 00:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
* 22:13 logmsgbot: andyrussg@tin Started scap: Update CentralNotice to 2.6.0 for wmf21
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
* 20:27 arlolra: updated Parsoid to version 5f2fae6c
* 00:57 mutante: doc1002 - issue is mismatched UIDs again, most likely. doc-uploader is debmonitor on new host
* 20:08 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf21
* 00:56 mutante: doc1002 - manually running rsync to doc2002 - which failed with status 23 when started by timer
* 20:02 logmsgbot: krinkle@tin Synchronized php-1.26wmf21/resources/src/startup.js: Ie65427caee (duration: 00m 12s)
* 00:09 tzatziki: removing 2 files for legal compliance
* 19:09 mutante: restarted gitblit, stopped counting
* 19:07 paravoid: upgrading mr1-codfw, mr1-ulsfo to newer junos
* 19:01 urandom: bouncing Cassandra on restbase1001 to address bogus icinga process failure alert
* 18:52 legoktm: deployed patch for T110553
* 18:36 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf21
* 18:32 cmjohnson1: replacing disk 10 on db1028
* 18:13 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 17:50 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/VisualEditor/modules/ve-mw/ui/inspectors: https://gerrit.wikimedia.org/r/#/c/235511/ (duration: 00m 12s)
* 17:07 logmsgbot: ori@tin Synchronized php-1.26wmf21/extensions/UniversalLanguageSelector: 78a5908fd9: Updated mediawiki/core Project: mediawiki/extensions/UniversalLanguageSelector (duration: 00m 16s)
* 17:07 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/UniversalLanguageSelector: 2154acc529: Updated mediawiki/core Project: mediawiki/extensions/UniversalLanguageSelector (duration: 00m 13s)
* 16:25 mutante: restarting NTP on lvs2004
* 16:12 jynus: setting BBU auto-learn mode to warn only (disabled if not possible) on all database hosts
* 16:03 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/MultimediaViewer/MultimediaViewer.php: https://gerrit.wikimedia.org/r/#/c/235484/ (duration: 00m 12s)
* 16:01 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: https://gerrit.wikimedia.org/r/#/c/235486/ (duration: 00m 12s)
* 15:58 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/MultimediaViewer/MultimediaViewer.php: https://gerrit.wikimedia.org/r/#/c/235483/ (duration: 00m 13s)
* 15:56 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: https://gerrit.wikimedia.org/r/#/c/235485/ (duration: 00m 12s)
* 15:51 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: T110837 (duration: 00m 13s)
* 15:42 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235482/ (duration: 00m 12s)
* 15:34 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235479/ (duration: 00m 13s)
* 15:19 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: https://gerrit.wikimedia.org/r/#/c/235442/ (duration: 00m 12s)
* 15:14 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: https://gerrit.wikimedia.org/r/#/c/235441/ (duration: 00m 12s)
* 15:07 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234942/ and https://gerrit.wikimedia.org/r/#/c/234944/ (duration: 00m 13s)
* 14:40 Nikerabbit: TTMServer reindex complete
* 11:59 mark: removed tools LV snapshots on labstore1002
* 11:47 mark: kill STOP'ed rsync on labstore1002
* 11:00 jynus: cloning mysql data from es1002 into es1016 [ETA:16h]
* 10:30 moritzm: installed qemu security updates on labvirt*
* 09:41 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1002 (duration: 00m 12s)
* 09:21 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1010, pool es1017 (duration: 00m 13s)
* 09:19 hashar: Merged in "delete 1.26wmf12" https://gerrit.wikimedia.org/r/235347 which was left unmerged in Gerrit but was present on tin /srv/mediawiki-staging confusing people.
* 08:03 bblack: restarting ntp on lvs2004
* 08:01 moritzm: enable ferm on db1069/sanitarium
* 07:50 moritzm: enable ferm on remaining phabricator db hosts
* 04:54 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Sep  2 04:54:37 UTC 2015 (duration 54m 36s)
* 02:52 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-02 02:52:51+00:00
* 02:50 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 09s)
* 02:29 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-02 02:29:56+00:00
* 02:26 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 31s)
* 00:33 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235366/ (duration: 00m 13s)


== 2015-09-01 ==
== 2023-03-24 ==
* 23:59 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/221731/ (duration: 00m 13s)
* 23:58 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 23:41 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235285/ (duration: 00m 14s)
* 23:57 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 23:08 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235362/ (duration: 00m 14s)
* 23:50 tzatziki: removing 1 file for legal compliance
* 23:02 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/235361/ (duration: 00m 13s)
* 21:08 mutante: mwmaint1002 ferm rules for rsyncd_access from miscweb removed by puppet after {{Gerrit|I4fe17f397856361}} which reverted a8af0339bde14018e8. manually deleted rsyncd config and stopped rsync service. complete noop on mwmaint2002 which is currently the active mwmaint server. [[phab:T328907|T328907]]
* 22:50 awight: update CRM from 0fc8474338e7a31fdde79287bd667b98cd96a252 to abc34b87ee9d1dbb1176f1929a3d748e1ee5ac7b
* 18:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable (duration: 00m 13s)
* 22:18 MaxSem: Maps: creating and populating admin table
* 18:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable
* 21:20 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/235177/ (duration: 00m 12s)
* 18:30 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags (duration: 00m 16s)
* 20:54 ori: restarted nutcracker on mw1142
* 18:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags
* 20:33 logmsgbot: twentyafterfour@tin Finished scap: sync 1.26wmf21 (duration: 30m 37s)
* 18:00 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 20s)
* 20:03 logmsgbot: twentyafterfour@tin Started scap: sync 1.26wmf21
* 18:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag
* 19:52 YuviPanda: removed tools20150901132642 from labstore vg on labstore1002
* 17:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 06s)
* 19:36 logmsgbot: ori@tin Synchronized php-1.26wmf20/includes/skins/SkinTemplate.php: cc643a0934: Deprecate unconditional loading of mediawiki.ui.button on all pages (duration: 00m 13s)
* 17:55 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag
* 17:31 urandom: bouncing Cassandra on restbase1001 to apply temporary GC setting
* 15:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 17:28 dcausse: freezing elasticsearch indices before applying ferm fules on master
* 15:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 17:23 logmsgbot: aude@tin Synchronized php-1.26wmf20/extensions/Wikidata: Fix for change dispatcher (duration: 00m 20s)
* 15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 16:45 jynus: performing schema change on testwiki and metawiki
* 15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:12 robh: policy.wikimedia.org dns change happening now
* 15:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 16:00 chasemp: ferm for elastic1003/2/1(master)
* 15:35 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 15:57 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/235168/ (duration: 00m 13s)
* 15:09 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:51 YuviPanda: stopped replicate-tools on labstore1002, and cleaned out lockdir
* 14:59 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:47 logmsgbot: reedy@tin Synchronized php-1.26wmf20/extensions/SecurePoll/: Stop cronspam (duration: 00m 13s)
* 14:24 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki wikimaniawiki "2024:Expressions of Interest" "Wikimania:Expressions of Interest" "Zabe" --reason "per request [[:phab:T332917{{!}}T332917]]" # [[phab:T332917|T332917]]
* 15:47 mark: labstore1002: echo 10000 > /sys/block/md123/md/sync_speed_min
* 11:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
* 15:44 mark: labstore1002: update-initramfs -k all -u
* 11:44 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
* 15:38 mark: labstore1002: mdadm /dev/md/slice51 --add /dev/sd{bh,bg,bf,be,bd,bc}
* 11:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 15:36 moritzm: disabled ferm in analytic1028, needs some more work on possibly dynamic mapreduce ports
* 11:01 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 15:16 mark: labstore1002: mdadm /dev/md/slice15 --re-add /dev/sd{bb,ba,az}
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
* 15:14 mark: labstore1002: mdadm /dev/md/slice15 --re-add /dev/sdaw
* 10:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
* 15:07 mark: labstore1002: mdadm --zero-superblock /dev/sd{aw,bh,bg,bf,be,bd,bc,bb,ba,az}1
* 10:35 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:04 moritzm: enabled ferm in analytic1028 (initial hadoop worker)
* 10:00 marostegui: Upgrade db1204 to mariadb 10.6 [[phab:T330861|T330861]]
* 15:04 mark: labstore1002: mdadm --zero-superblock /dev/sdax1 && mdadm /dev/md/slice15 --re-add /dev/sdax
* 08:57 hashar: Fixed up Gerrit > GitHub replication which broke at 5:00 UTC by updating the Github RSA ssh host key [[phab:T332972|T332972]]
* 15:03 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/231465/ - VE for all new enwiki accounts (duration: 00m 13s)
* 05:37 hashar: gerrit: refreshed ssh host key for `github.com`
* 14:58 mark: labstore1002: mdadm /dev/md/slice15 --re-add /dev/sday
* 05:28 hashar: Restarted Gerrit
* 14:58 mark: labstore1002: mdadm --zero-superblock /dev/sday1
* 05:26 hashar: Stopping Gerrit
* 14:53 mark: labstore1002: mdadm --stop /dev/md3
* 05:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 10s)
* 14:37 ebernhardson: reset elasticsearch cluster.routing.allocation.disk.high back to 90%
* 05:26 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]])
* 13:38 logmsgbot: krinkle@tin Synchronized w/: Remove rl-test.php (duration: 00m 13s)
* 05:22 hashar: Restarting gerrit replica on gerrit2002.wikimedia.org
* 13:17 moritzm: enabled ferm on db1048
* 05:21 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 07s)
* 13:09 moritzm: enabled ferm on labsdb100[467]
* 05:20 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]])
* 12:01 YuviPanda: disable puppet on labsdb1006
* 05:17 hashar: Restarting Gerrit for deploying plugins updates
* 08:58 moritzm: enabled ferm on labsdb1001
* 05:10 ejegg: Standalone SmashPig upgraded from {{Gerrit|3b84e4cb}} to {{Gerrit|50139e82}}
* 08:58 godog: fixup current graphite retention for metrics under "servers" hierarchy T96662
* 05:04 ejegg: payments-wiki upgraded from {{Gerrit|4d0c90b4}} to {{Gerrit|4b0a71fa}}
* 08:51 moritzm: enabled ferm on labsdb1002
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:31 moritzm: enabled ferm on labsdb1003
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:29 godog: repool mw1125 mw1142 after nutcracker failures
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:45 jynus: cloning mysql data from es1010 to es1017 [ETA: 6h]
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:23 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1010 (duration: 00m 12s)
* 07:13 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1007, pool es1013 (duration: 00m 13s)
* 06:36 mutante: uploaded survey2012 to dumps/dataset1001; ownership as it is for survey2011; - T110746 in time for midnight PST
* 05:18 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Sep  1 05:18:09 UTC 2015 (duration 18m 8s)
* 02:28 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-01 02:28:30+00:00
* 02:25 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 00s)


== 2015-08-31 ==
== 2023-03-23 ==
* 23:56 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/233665/ (duration: 00m 11s)
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:49 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: reenable config changes for cirrus experimental completion api (duration: 00m 12s)
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:40 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/EducationProgram: 97ab82eab2: Updated mediawiki/core Project: mediawiki/extensions/EducationProgram  85a7d3932c1a4ad28f1a8dd05704f4e524152349 (duration: 00m 14s)
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:27 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf20/extensions/CirrusSearch/: (no message) (duration: 00m 12s)
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:25 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: revert update for cirrussearch experimental suggestions api (duration: 00m 12s)
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:21 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: update config of cirrussearch experimental suggestions api (duration: 00m 12s)
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 22:45 chasemp: disabled puppet on elastic hosts temporarily to safely roll out fw change.  elastic seems to have not taken it well and I'm holding for green cluster state.
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:20 mutante: installing package upgrades on argon
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:58 ori: imported pybal_1.08_amd64.changes to jessie-wikimedia
* 22:30 mutante: moscovium - rebooting to finalize distro release upgrade - [[phab:T332952|T332952]]
* 20:44 chasemp: ferm for elastic100[4-7] and adjust ferm to include wikitech source
* 22:20 mutante: moscovium performing apt-get full-upgrade [[phab:T332952|T332952]]
* 20:21 subbu: deployed parsoid version c3e4df5e
* 22:09 mutante: moscovium - when doing an in-place upgrade from buster to bullseye and you replace the string in sources.list, you also need to replace "bullseye-updates" with "bullseye-security" in the security.debian.org lines - that this is needed is called a bug at https://shagain.club/index.php/archives/641/ - [[phab:T327068|T327068]]
* 16:22 godog: depool mw1125 + mw1142 from api, nutcracker client connections exceeded
* 22:00 mutante: moscovium - apt-get full-upgrade ; apt autoremove ; replace buster with bullseye in sources.list ; repeat apt-get upgrade/full-upgrade etc. (https://wiki.debian.org/DebianUpgrade) [[phab:T327068|T327068]]
* 16:06 logmsgbot: thcipriani@tin Finished scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]] (duration: 27m 07s)
* 22:00 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc2002.codfw.wmnet with OS bullseye
* 15:59 jynus: restarting hhvm on mw2187
* 21:57 mutante: moscovium - apt-get upgrade (rt.wikimedia.org going into maintenance) [[phab:T327068|T327068]]
* 15:39 logmsgbot: thcipriani@tin Started scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]]
* 21:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 15:33 mutante: terbium - Could not find dependent Service[nscd] for File[/etc/ldap/ldap.conf]
* 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 15:28 logmsgbot: thcipriani@tin Synchronized closed-labs.dblist: SWAT: Creating closed-labs.dblist and closing es.wikipedia.beta.wmflabs.org [[gerrit:234594]] (duration: 00m 13s)
* 21:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 15:25 logmsgbot: thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: Remove files from Commons from search results on wikimediafoundation.org [[gerrit:234040]] (duration: 00m 11s)
* 21:45 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 15:25 ottomata: starting varnishkafka instances on frontend caches to produce eventlogging client side events to kafka
* 21:31 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 15:21 logmsgbot: thcipriani@tin Synchronized php-1.26wmf20/extensions/Wikidata: SWAT: Update Wikidata - Fix formatting of client edit summaries [[gerrit:234991]] (duration: 00m 21s)
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:16 logmsgbot: thcipriani@tin Synchronized php-1.26wmf20/extensions/UploadWizard/resources/controller/uw.controller.Step.js: SWAT: Keep the uploads sorted in the order they were created in initially [[gerrit:234553]] (duration: 00m 12s)
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:43 ebernhardson: elasticsearch cluster.routing.allocation.disk.watermark.high set to 75% to force elastic1022 to reduce its disk usage
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:41 urandom: bouncing Cassandra on restbase1001 to apply temporary GC setting
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:06 akosiaris: rebooted krypton. was reporting 100% cpu steal time
* 21:25 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 13:40 paravoid: running puppet on newly-installed mc2001
* 21:24 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 13:40 paravoid: restarting hhvm on mw1065
* 20:42 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 11:10 moritzm: restart salt-master on palladium
* 20:42 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 10:45 paravoid: reenabling asw2-a5-eqiad:xe-0/0/36 (T107635)
* 20:35 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 10:36 godog: repool ms-fe1004
* 20:34 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 10:32 godog: repool ms-fe1003 and depool ms-fe1004 for firewall changes
* 20:33 taavi@deploy2002: Finished scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] (duration: 10m 56s)
* 10:19 godog: update graphite retention policy on files with previous retention and older than 30d T96662
* 20:33 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 10:18 godog: repool ms-fe1002 and depool ms-fe1003 for firewall changes
* 20:24 taavi@deploy2002: abi and taavi: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 10:05 godog: depool ms-fe1002 to apply firewall changes
* 20:23 taavi@deploy2002: Started scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]]
* 09:55 jynus: cloning es1007 mysql data into es1013 (ETA: 5h30m)
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 09:51 godog: repool ms-fe1001
* 19:36 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 09:35 godog: depool ms-fe1001 in preparation for ferm changes
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:27 godog: update graphite retention policy on files with previous retention and older than 60d T96662
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 09:25 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1007 for maintenance (duration: 00m 13s)
* 19:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 08:33 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 12s)
* 19:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 04:34 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Aug 31 04:34:14 UTC 2015 (duration 34m 13s)
* 19:31 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 04:05 bblack: disabled ipv6 autoconf on neon, flushed old dynamic addr
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2002
* 02:32 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-31 02:32:25+00:00
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:29 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 42s)
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 19:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 19:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 19:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2002
* 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 17:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 17:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 17:39 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 17:38 mutante: moscovium - systemctl stop rsync
* 17:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 17:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 17:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 17:18 mutante: aphlict1001 - systemctl reset-failed; systemctl start logrotate ; systemctl start logrotate.timer
* 16:59 sukhe: rolling out CR 901333 to A:cp-text [[phab:T313578|T313578]]
* 16:45 sukhe: disable Puppet in A:cp to test and then merge CR 901333
* 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 16:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 16:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc1002.wikimedia.org with OS bullseye
* 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:12 vgutierrez: testing haproxy_2.6.11-1~bpo11+wmf2_amd64.deb in text@ulsfo - [[phab:T332796|T332796]]
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc1002.wikimedia.org with OS bullseye
* 14:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
* 14:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host lists1003.wikimedia.org with OS bullseye
* 14:53 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:53 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
* 14:45 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1002.wikimedia.org
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host lists1003.wikimedia.org with OS bullseye
* 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:22 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host pybal-test2003.codfw.wmnet with OS bullseye
* 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
* 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:16 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc1002.wikimedia.org
* 14:13 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 14:11 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] (duration: 01m 32s)
* 14:11 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
* 14:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] (duration: 00m 09s)
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] (duration: 05m 10s)
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:03 joal@deploy2002: Started deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d]
* 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
* 13:55 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host pybal-test2003.codfw.wmnet with OS bullseye
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] (duration: 01m 28s)
* 13:46 TheresNoTime: close UTC afternoon backport window
* 13:45 samtar@deploy2002: Finished scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] (duration: 07m 46s)
* 13:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac]
* 13:44 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] (duration: 00m 08s)
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac]
* 13:43 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] (duration: 13m 06s)
* 13:39 samtar@deploy2002: samtar: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:37 samtar@deploy2002: Started scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]]
* 13:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] (duration: 08m 05s)
* 13:30 joal@deploy2002: Started deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac]
* 13:29 samtar@deploy2002: samtar and sgimeno: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]]
* 13:26 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki ckbwiki --fix` [[phab:T332470|T332470]]
* 13:25 samtar@deploy2002: Finished scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] (duration: 08m 39s)
* 13:18 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:16 samtar@deploy2002: Started scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]]
* 13:15 samtar@deploy2002: Finished scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] (duration: 11m 47s)
* 13:08 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:03 samtar@deploy2002: Started scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]]
* 12:14 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:57 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:47 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache upload cluster - [[phab:T332796|T332796]]
* 11:36 btullis@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:26 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc2002.wikimedia.org with OS bullseye
* 11:15 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 11:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:44 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc2002.wikimedia.org with OS bullseye
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2002.wikimedia.org
* 10:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2005.codfw.wmnet with OS bullseye
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc2002.wikimedia.org
* 10:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2005.codfw.wmnet with OS bullseye
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:47 moritzm: uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 08:21 elukey: clean up docker and reboot kubernetes2024 to enable overlay2 - [[phab:T332803|T332803]]
* 08:11 vgutierrez: testing HAProxy 2.6.11 in cp4044 - [[phab:T332796|T332796]]
* 08:08 vgutierrez: fetch haproxy 2.6.11 in apt.wm.o thirdparty/haproxy26 for bullseye & buster
* 08:04 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache text cluster - [[phab:T332796|T332796]]
* 07:54 elukey: clean up docker and reboot kubernetes2023 to enable overlay2 - [[phab:T332803|T332803]]
* 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:42 elukey: clean up docker on kubernetes1024 (cordon + stop kubelet + docker + clean /var/lib/docker/*) and reboot to enable overlay2 - [[phab:T332803|T332803]]
* 07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45928 and previous config saved to /var/cache/conftool/dbconfig/20230323-072315-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45927 and previous config saved to /var/cache/conftool/dbconfig/20230323-070811-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45926 and previous config saved to /var/cache/conftool/dbconfig/20230323-065306-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45925 and previous config saved to /var/cache/conftool/dbconfig/20230323-063800-root.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45924 and previous config saved to /var/cache/conftool/dbconfig/20230323-062255-root.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45923 and previous config saved to /var/cache/conftool/dbconfig/20230323-060750-root.json
* 05:37 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 05:34 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 04:25 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 02:07 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 02:00 mutante: rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for [[phab:T331896|T331896]] - no automatic sync / deploy for these
* 01:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 01:03 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:27 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 00:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc1003.eqiad.wmnet with OS bullseye


== 2015-08-30 ==
== 2023-03-22 ==
* 12:58 godog: lvchange -ay labstore/others on labstore1002
* 23:59 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 12:52 godog: start-nfs on labstore1002
* 23:56 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 12:31 godog: lvchange -ay labstore/tools on labstore1002
* 23:46 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc1003.eqiad.wmnet with OS bullseye
* 12:30 godog: also disabled puppet on labstore1002 while investigating
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 12:15 godog: trying to manually assemble missing raid on labstore1002 with mdadm --assemble /dev/md/slice51 --uuid 0747643d:b89b36ff:57156095:c33694fc --verbose
* 23:34 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 11:19 YuviPanda: powered labstore1002 back up
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:17 YuviPanda: shut down labstore1002, going to powercycle from mgmt
* 23:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 10:34 YuviPanda: disabled backups on labstore1002 to prevent overwriting of good backups on 2001
* 23:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 10:08 YuviPanda: rebooted labstore1002
* 23:32 zabe: zabe@mwmaint2002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # [[phab:T332782|T332782]]
* 04:16 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Aug 30 04:16:17 UTC 2015 (duration 16m 16s)
* 23:31 zabe@deploy2002: Finished scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] (duration: 10m 03s)
* 02:23 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-30 02:23:07+00:00
* 23:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1003.wikimedia.org
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 05m 36s)
* 23:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 23:24 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 23:22 zabe@deploy2002: zabe: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 23:21 zabe@deploy2002: Started scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]]
* 21:15 taavi: UTC late backports complete
* 21:13 taavi@deploy2002: Finished scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] (duration: 07m 29s)
* 21:08 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1003.eqiad.wmnet
* 21:08 taavi@deploy2002: taavi: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 21:06 taavi@deploy2002: Started scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]]
* 21:05 taavi@deploy2002: Finished scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] (duration: 07m 17s)
* 20:59 taavi@deploy2002: taavi: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:58 taavi@deploy2002: Started scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]]
* 20:54 samtar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:900748{{!}}Enable page tools for anonymous users (T331052)]] (duration: 10m 10s)
* 20:37 akosiaris: uncordon reboot kubernetes1023. It was drained previously for ⚓ [[phab:T332803|T332803]]
* 20:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] (duration: 11m 47s)
* 20:32 akosiaris: reboot kubernetes1023 for a test once more, ⚓ [[phab:T332803|T332803]]
* 20:32 akosiaris: reboot kubernetes1023 for a test once more
* 20:28 samtar@deploy2002: samtar and nray: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:25 akosiaris: reboot kubernetes1023 for a test
* 20:24 samtar@deploy2002: Started scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]]
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] (duration: 09m 57s)
* 20:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists1003.wikimedia.org on all recursors
* 20:15 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache lists1003.wikimedia.org on all recursors
* 20:15 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:15 samtar@deploy2002: kharlan and samtar: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]]
* 20:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.eqiad.wmnet on all recursors
* 20:11 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.eqiad.wmnet on all recursors
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 20:10 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 20:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] (duration: 07m 22s)
* 20:07 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 20:07 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.eqiad.wmnet
* 20:07 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 20:07 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1003.wikimedia.org
* 20:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doc1003.wikimedia.org
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 20:06 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 20:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 20:05 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:04 samtar@deploy2002: samtar and matmarex: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:02 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 (duration: 00m 21s)
* 20:02 samtar@deploy2002: Started scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]]
* 20:02 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0
* 20:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 20:01 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.wikimedia.org
* 18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 18:12 mutante: rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ [[phab:T331896|T331896]] - but also see [[phab:T332101|T332101]]
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dborch1002.wikimedia.org
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 17:38 _joe_: stopping apache on mwdebug1001 to test the new envoy error page
* 17:15 hashar@deploy2002: Synchronized composer.json: build: add local typos check to composer.json # [[phab:T332121|T332121]] (duration: 06m 44s)
* 17:12 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 17:09 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 17:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 17:06 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 17:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts dborch1002.wikimedia.org
* 17:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 17:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:45 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) (duration: 00m 12s)
* 16:45 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided)
* 16:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 16:37 eoghan@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
* 16:37 eoghan@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
* 16:35 vgutierrez: rolling downgrade to HAProxy 2.6.9 in text@esams - [[phab:T332796|T332796]]
* 16:24 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 16:19 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 16:18 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 16:18 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 15:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dborch1001.wikimedia.org with OS bullseye
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 15:53 moritzm: uploaded druid 0.19.wmf0-2 to bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 15:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 15:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 15:39 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:31 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:30 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1001.wikimedia.org with OS bullseye
* 15:27 elukey: `racadm racreset` for kafka-main2004 (no http idrac available for the cookbook, ssh one available)
* 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:26 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 15:25 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:25 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:22 hnowlan: removing java packages from maps hosts
* 15:17 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 15:17 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 15:13 hnowlan: removing cassandra packages from maps hosts
* 15:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 14:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45917 and previous config saved to /var/cache/conftool/dbconfig/20230322-141923-root.json
* 14:17 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:17 sukhe: enable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:13 sukhe: disable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45916 and previous config saved to /var/cache/conftool/dbconfig/20230322-140418-root.json
* 14:02 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45915 and previous config saved to /var/cache/conftool/dbconfig/20230322-134913-root.json
* 13:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45914 and previous config saved to /var/cache/conftool/dbconfig/20230322-133409-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45913 and previous config saved to /var/cache/conftool/dbconfig/20230322-131904-root.json
* 13:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG (duration: 00m 12s)
* 13:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG
* 13:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45912 and previous config saved to /var/cache/conftool/dbconfig/20230322-130359-root.json
* 13:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:44 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 12:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:30 marostegui: Poweroff db1121 (lag will show on wikireplicas for s4 section) [[phab:T323961|T323961]]
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool needs to be rebooted [[phab:T323961|T323961]]', diff saved to https://phabricator.wikimedia.org/P45910 and previous config saved to /var/cache/conftool/dbconfig/20230322-112031-root.json
* 11:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:15 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:02 jbond: upgrader prometheus-ipmi-exporter on buster and bullseye
* 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:34 elukey: `racadm racreset` for kafka-main2005 - http idrac not available (ssh on works fine)
* 10:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:29 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:26 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1004.eqiad.wmnet with OS bullseye
* 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1004.eqiad.wmnet with OS bullseye
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:23 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:11 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:02 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:52 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 08:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:25 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:24 XioNoX: deploy measure-$site.wikimedia.org CNAMES
* 08:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 08:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 08:18 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 08:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141082
* 07:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141082
* 00:57 zabe@deploy2002: Finished scap: update interwiki cache (duration: 07m 02s)
* 00:50 zabe@deploy2002: Started scap: update interwiki cache
* 00:47 zabe@deploy2002: Finished scap: [[phab:T332115|T332115]] (duration: 06m 56s)
* 00:40 zabe@deploy2002: Started scap: [[phab:T332115|T332115]]
* 00:40 zabe: create Wikipedia Angika (anpwiki) # [[phab:T332115|T332115]]
* 00:38 zabe@deploy2002: Finished scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] (duration: 27m 00s)
* 00:29 zabe@deploy2002: zabe: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:11 zabe@deploy2002: Started scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]]


== 2015-08-29 ==
== 2023-03-21 ==
* 15:26 jynus: killing idle mysql connections from phabricator and setting wait and interactive timeout to 60
* 23:46 zabe@deploy2002: Finished scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] (duration: 30m 08s)
* 09:30 jynus: SCAP failed, cannot depool db1028
* 23:35 zabe@deploy2002: zabe: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 09:28 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 03s)
* 23:15 zabe@deploy2002: Started scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]]
* 09:28 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 03s)
* 23:07 zabe@deploy2002: Finished scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]] (duration: 07m 10s)
* 09:05 jynus: about to depool db1028 due to disk issue
* 23:00 zabe@deploy2002: Started scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]]
* 04:17 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Aug 29 04:17:55 UTC 2015 (duration 17m 54s)
* 22:47 ejegg: payments-wiki upgraded from {{Gerrit|0fd66b1f}} to {{Gerrit|ab0a55a2}}
* 02:24 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-29 02:24:01+00:00
* 22:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] (duration: 07m 15s)
* 02:21 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 05m 48s)
* 22:04 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:03 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]]
* 21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 21:02 AndyRussG: update SmashPig  config {{Gerrit|6e651fd4}} -> {{Gerrit|035f602a}}
* 20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 20:48 taavi: start [[phab:T315510|T315510]] migration script on group2 s7 wikis
* 20:39 taavi@deploy2002: Finished scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] (duration: 09m 01s)
* 20:31 taavi@deploy2002: matmarex and taavi: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:30 taavi@deploy2002: Started scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]]
* 20:20 taavi@deploy2002: Finished scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] (duration: 17m 40s)
* 20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:02 taavi@deploy2002: Started scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]]
* 19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts
* 19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts
* 19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s)
* 19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag
* 19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye
* 18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 18:00 AndyRussG: update SmashPig config {{Gerrit|59a8b2d2}} -> {{Gerrit|6e651fd}}
* 17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org
* 17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s)
* 17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b]
* 17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent'
* 16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye
* 16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors
* 16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
* 16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet
* 15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5
* 15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:42 jbond: stop puppet from deploying this further
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] (duration: 09m 11s)
* 15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:19 samtar@deploy2002: samtar: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:17 samtar@deploy2002: Started scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]]
* 15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:10 samtar@deploy2002: Finished scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] (duration: 09m 32s)
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:02 samtar@deploy2002: samtar: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 15:00 samtar@deploy2002: Started scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]]
* 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet
* 14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet
* 14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:38 hnowlan: disabling puppet on maps* before merging 760619
* 14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s)
* 14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:02 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]]
* 14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - [[phab:T319372|T319372]]
* 11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00
* 11:08 joal: Kill mediacounts_load oozie job
* 11:07 joal: Unpause mediawiki_history_denormalize airflow job
* 11:06 joal: Kill mediawiki_denormalize oozie job
* 11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)
* 11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]
* 10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)
* 10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]
* 09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)
* 09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided)
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - [[phab:T319372|T319372]]
* 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
* 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
* 03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s)
* 03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]] (duration: 52m 38s)
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]


== 2015-08-28 ==
== 2023-03-20 ==
* 23:45 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234679/ (duration: 06m 56s)
* 22:00 samtar@deploy2002: Finished scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] (duration: 09m 45s)
* 22:51 logmsgbot: bd808@tin Synchronized wmf-config/CommonSettings-labs.php: Use ffmpeg instead of avconv on labs beta (I250fe33) (duration: 06m 05s)
* 21:52 samtar@deploy2002: jdlrobson and samtar: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:05 ori: disabling puppet on tin for a few minutes to test an ssh-agent-proxy change
* 21:50 samtar@deploy2002: Started scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]]
* 20:04 logmsgbot: catrope@tin Synchronized php-1.26wmf20/resources/src/mediawiki.legacy/shared.css: T110716 (duration: 00m 12s)
* 21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` [[phab:T332614|T332614]]
* 18:09 robh: updating ldap-codfw cert
* 21:25 TheresNoTime: closing UTC late backport window, extended
* 17:10 logmsgbot: catrope@tin Synchronized php-1.26wmf20/extensions/Flow/includes/Parsoid/Utils.php: T110676 (duration: 00m 13s)
* 21:22 samtar@deploy2002: Finished scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] (duration: 12m 22s)
* 17:08 urandom: bouncing Cassandra on restbase1001 to apply default (puppet-managed) settings
* 21:11 samtar@deploy2002: samtar and aleksandar: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:03 chasemp: ferm for elasticsearch10(0[8-9|1[0-13])
* 21:10 samtar@deploy2002: Started scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]]
* 15:31 awight: updated crm from fc0fcc8f5af262b56392d3f4f5998f8ea08c99a8 to 0fc8474338e7a31fdde79287bd667b98cd96a252
* 21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s)
* 15:23 chasemp: ferm for elasticsearch10[14-17]
* 21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer
* 11:09 logmsgbot: aude@tin Synchronized php-1.26wmf20/extensions/Wikidata/Wikidata.php: Sync entry point - updated to work on Jenkins together with ContentTranslation (duration: 00m 12s)
* 21:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] (duration: 08m 34s)
* 10:29 godog: reenable puppet on ms-fe1, ferm changes will go out on monday
* 21:02 samtar@deploy2002: matmarex and samtar: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:48 jynus: Cloning es1001 database into es1012
* 21:00 TheresNoTime: extending UTC late backport window
* 09:45 moritzm: enabled ferm for swift on esams
* 21:00 samtar@deploy2002: Started scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]]
* 09:28 moritzm: enabled ferm on strontium puppetmaster backend
* 20:58 kharlan@deploy2002: Finished scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] (duration: 10m 28s)
* 09:00 moritzm: enabled ferm on rhodium puppetmaster backend
* 20:49 kharlan@deploy2002: kharlan: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn
* 08:29 moritzm: uploaded debdeploy 0.0.3 to carbon
* 20:47 kharlan@deploy2002: Started scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]]
* 08:23 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1001, increas weight of es1011, pool es1014 for the first time (duration: 00m 13s)
* 19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) [[phab:T257317|T257317]] [[phab:T332623|T332623]] [[phab:T331896|T331896]]
* 05:59 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Aug 28 05:59:09 UTC 2015 (duration 59m 8s)
* 19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s)
* 04:58 logmsgbot: ori@tin Synchronized php-1.26wmf20/includes/parser/Parser.php: 754b222daf: Add ParserOutput cache and expiry times to NewPP report (duration: 00m 13s)
* 19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator
* 02:41 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-28 02:41:26+00:00
* 18:56 ejegg: switched back to new PayPal pending transaction resolver
* 02:35 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 10m 47s)
* 18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s)
* 01:59 Tim: on ruthenium: started parsoid_vd which was previously killed by oom-killer
* 18:47 akosiaris: emergency rollover of redis password complete
* 01:58 Tim: on ruthenium, reduced parsoid-rt-client concurrency from 16 to 8 since it was OOM and oom-killer was killing random things
* 18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry*
* 01:37 Tim: on ruthenium restarted parsoid-rt-client and parsoid-vd-client
* 18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s)
* 00:24 mutante: powercycled mw2027
* 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script
* 00:19 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/234450/ (duration: 01m 14s)
* 18:42 ejegg: civicrm upgraded from {{Gerrit|3d3606f1}} to {{Gerrit|09373b9d}}
* 00:06 logmsgbot: krenair@tin Synchronized wmf-config/mobile.php: live hack to make previous commit work (duration: 01m 14s)
* 18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 00:05 Krenair: Another codfw host broke: mw2027
* 18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 00:01 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234330/ (duration: 00m 13s)
* 18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost ([[phab:T331896|T331896]])
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
* 17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict: miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP ([[phab:T196968|T196968]])
* 17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry*
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552
* 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:11 TheresNoTime: close UTC afternoon backport window
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` [[phab:T331762|T331762]]
* 14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` [[phab:T331762|T331762]]
* 14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` [[phab:T332351|T332351]]
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` [[phab:T331762|T331762]]
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") [[phab:T331762|T331762]]
* 14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` [[phab:T331762|T331762]]
* 13:58 samtar@deploy2002: Finished scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] (duration: 09m 44s)
* 13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:49 samtar@deploy2002: Started scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]]
* 13:47 samtar@deploy2002: Finished scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] (duration: 09m 26s)
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:39 samtar@deploy2002: aleksandar and samtar: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:38 samtar@deploy2002: Started scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]]
* 13:37 samtar@deploy2002: Finished scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] (duration: 08m 46s)
* 13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 30s)
* 13:29 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}}
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]]
* 13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 39s)
* 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}}
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] (duration: 11m 36s)
* 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 13:08 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:06 samtar@deploy2002: Started scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]]
* 11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s)
* 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552
* 09:21 claime: Repooling parse2004 - [[phab:T332119|T332119]]
* 08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915
* 08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915
* 08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915
* 08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915


== 2015-08-27 ==
== 2023-03-19 ==
* 23:58 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/MobileFrontend/includes/MobileFormatter.php: https://gerrit.wikimedia.org/r/#/c/234331/1 (duration: 00m 12s)
* 18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) {{Gerrit|27a5b481}} -> {{Gerrit|6359222d}}
* 23:57 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/MobileFrontend/includes/config/Experimental.php: https://gerrit.wikimedia.org/r/#/c/234331/1 (duration: 00m 14s)
* 16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done
* 23:55 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/233439/ (duration: 00m 12s)
* 15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) {{Gerrit|5dd37c9c}} -> {{Gerrit|3d3606f1}}
* 23:30 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/Gadgets/extension.json: touch (duration: 00m 13s)
* 14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host
* 23:24 logmsgbot: krenair@tin Synchronized php-1.26wmf20/includes/DefaultSettings.php: https://gerrit.wikimedia.org/r/#/c/234328/ (duration: 00m 12s)
* 00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 23:24 logmsgbot: krenair@tin Synchronized php-1.26wmf20/includes/registration/ExtensionProcessor.php: https://gerrit.wikimedia.org/r/#/c/234328/ (duration: 00m 12s)
* 00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 23:23 logmsgbot: krenair@tin Synchronized php-1.26wmf20/includes/MWNamespace.php: https://gerrit.wikimedia.org/r/#/c/234328/ (duration: 00m 13s)
* 23:15 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/234009/ (duration: 00m 13s)
* 23:04 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233100/ (duration: 00m 12s)
* 20:11 chasemp: ferm setup on elasticsearch10(1[8-9|2[0-3])
* 20:06 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf20
* 19:57 logmsgbot: twentyafterfour@tin Synchronized php-1.26wmf20/includes/media/XMP.php: deploy fix for T89532 on 1.26wmf20 (duration: 00m 13s)
* 18:16 chasemp: setting up ferm on elastic1027-31
* 17:47 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/234320/ (duration: 00m 13s)
* 17:43 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234320/2 (duration: 00m 13s)
* 17:37 urandom: ack'd Cassandra process alert on restbase1001; temporary command args have pushed the class name beyond the limit
* 17:34 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: (no message) (duration: 00m 12s)
* 17:24 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: https://gerrit.wikimedia.org/r/#/c/234320/ (duration: 00m 12s)
* 17:08 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 16:51 moritzm: ferm rules on logstash100[1-3] have been amended to allow grafana from reading dashboard configs
* 16:39 bd808: new ferm rules on logstash100[1-3] are blocking grafana from reading dashboard configs.
* 16:22 moritzm: ferm enabled on logstash1003
* 16:18 moritzm: ferm enabled on logstash1002
* 16:16 bd808: ferm enabled on logstash1001
* 16:06 bd808: logstash1001 back up after system reboot; we applied a default drop rule without applying the other iptables changes; will try again
* 15:58 chasemp: rebooting logstash1001.mgmt.eqiad.wmnet for moritz as it is having issues
* 15:47 bblack: killed hung ubuntu mirror rsync commands on carbon, from Jul 10
* 15:45 bd808: logstash1001 not responding over ssh following ferm rules application; moritzm investigating
* 15:30 bd808: Disabled puppet on logstash100[1-3] prior to trying to enable ferm
* 15:11 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable newarticle campaign in itwiki [[gerrit:234223]] (duration: 01m 52s)
* 14:52 bblack: re-imaging lvs200[123]
* 14:47 godog: reenable puppet on ms-be1*
* 14:22 godog: disable puppet on ms-fe1 / ms-be1 in prepration for puppet work
* 14:15 godog: reenable puppet on ms-fe2*
* 13:47 bblack: re-imaging lvs2004 + lvs2005
* 13:29 ottomata: doing rolling restart of kafka brokers to apply auto_create_topics change
* 13:21 godog: enable puppet on ms-be2*
* 13:21 ottomata: stopping kafka on analytics1021, it is no longer a kafka broker.
* 13:09 godog: disable puppet on ms-be2* in preparation for firewall changes
* 13:09 jynus: cloning es1008 into es1014
* 13:04 ottomata: running leader election now that all topics and partitions are rebalanced across new kafka nodes
* 12:46 bblack: re-imaging lvs2006
* 12:45 andrewbogott: re-imaging labnet1001 (I hope)
* 11:33 _joe_: restarted hhvm on mw1143, locked in __lll_lock_wait for stat_cache deadlock
* 11:10 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Pool es1011 for the first time, depool es1008 (duration: 00m 12s)
* 09:27 jynus: installing and configuring servers es1012-es1019
* 06:39 ostriches: tin: dropped useless "gerrit" remote from /srv/mediawiki-staging (uses ssh, lol), pointed {origin,readonly} at the actual repo instead of a redirect.
* 06:00 _joe_: powercycling mw2140, not responding to ping, blank console
* 03:17 awight: deploy config cleanup for paymentswiki
* 02:38 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 10m 44s)
* 02:16 awight: push config change to the payments orphan slayer: explitly give stomp port to work around strict notice, clean up unused globals. T109911
* 01:32 ejegg: updated payments from 8ba4b5299f195cf48e6809b18a21e2d53f6eec1b to 6ac552f280fb839069d117386c4ecbe9e52f90a8
* 00:31 twentyafterfour: finished phabricator upgrade, everything appears to be working
* 00:24 logmsgbot: aaron@tin Synchronized php-1.26wmf19/extensions/CentralAuth: 47e181adb2898977b146de7398eaa35aebb870e3 (duration: 01m 13s)
* 00:22 logmsgbot: aaron@tin Synchronized php-1.26wmf20/extensions/CentralAuth: 47e181adb2898977b146de7398eaa35aebb870e3 (duration: 01m 13s)
* 00:20 twentyafterfour: taking phabricator offline for scheduled upgrade


== 2015-08-26 ==
== 2023-03-18 ==
* 23:59 Krinkle: mwscript deleteEqualMessages.php --wiki rowiki
* 22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 23:57 yurik: git deployed tilerator - had the 4/5 issue - https://phabricator.wikimedia.org/T110434
* 22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 23:46 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234072/ (duration: 01m 12s)
* 14:26 apergos: rsync of xmldata public dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 23:37 logmsgbot: krenair@tin Synchronized php-1.26wmf20/maintenance/deleteEqualMessages.php: https://gerrit.wikimedia.org/r/#/c/234038/ (duration: 01m 12s)
* 13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 23:35 logmsgbot: krenair@tin Synchronized php-1.26wmf19/maintenance/deleteEqualMessages.php: https://gerrit.wikimedia.org/r/#/c/234037/1 (duration: 01m 12s)
* 07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 23:27 yurik: deployed kartotherian
* 07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 23:21 jynus: cloning es1005 into es1011, ETA 9 hours
* 02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 22:41 ori: armed keyholder on tin
* 02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 22:40 ori: Disabled Puppet on mw1017 for 2hrs and applied I059b0c96c9 for testing.
* 01:21 urandom: powercycling restbase2025 — [[phab:T332462|T332462]]
* 21:55 logmsgbot: krinkle@tin Synchronized php-1.26wmf19/includes/poolcounter/PoolWorkArticleView.php: (no message) (duration: 01m 12s)
* 00:06 AndyRussG: Updating civicrm from {{Gerrit|5dd37c9c}} to {{Gerrit|3d3606f1}}
* 21:48 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1005 (duration: 01m 12s)
* 21:40 logmsgbot: krinkle@tin Synchronized php-1.26wmf20/includes/poolcounter/PoolWorkArticleView.php: (no message) (duration: 01m 12s)
* 21:32 ori: Disabling Puppet on tin again to test an ssh-agent-proxy change
* 20:30 logmsgbot: ori@tin Synchronized README: testing ssh-agent-proxy changes (duration: 00m 13s)
* 20:25 ori: Disabling puppet on tin and hacking some debug logging into ssh-agent-proxy
* 20:24 ori: armed ssh-agent key on mira
* 20:21 logmsgbot: krinkle@tin Synchronized php-1.26wmf20/includes/poolcounter/PoolWorkArticleView.php: (no message) (duration: 00m 03s)
* 20:11 subbu: deployed parsoid version 44d657de
* 19:52 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/Echo/includes/mapper/EventMapper.php: https://gerrit.wikimedia.org/r/#/c/234082/ (duration: 00m 12s)
* 19:47 mutante: sodium - deleting shunted messages older than 7 days
* 19:23 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234042/ (duration: 00m 12s)
* 19:22 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/234024/ (duration: 00m 12s)
* 19:20 logmsgbot: krenair@tin Synchronized multiversion/MWWikiversions.php: https://gerrit.wikimedia.org/r/#/c/232672/ (duration: 00m 12s)
* 18:50 logmsgbot: krinkle@tin Synchronized php-1.26wmf20/maintenance/deleteEqualMessages.php: (no message) (duration: 00m 11s)
* 18:50 logmsgbot: krinkle@tin Synchronized php-1.26wmf19/maintenance/deleteEqualMessages.php: (no message) (duration: 00m 13s)
* 18:38 twentyafterfour: ^ stupid typo. That sync was group1 to 1.26wmf20
* 18:37 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: tig
* 18:31 logmsgbot: ori@tin Synchronized w/404.php: Ided1facc0: Remove auto-redirection from 404 page. (duration: 00m 13s)
* 17:51 ejegg: updated SmashPig from 258f2c917b1ae50b01231927bcd6f58ecaa8940b to fdb053efa617162ac9f695e493c390987a069140
* 17:30 urandom: bouncing Cassandra on restbase1001 to apply temporary GC setting
* 17:12 andrewbogott: ok, /now/ I’m running a dist-upgrade on labcontrol1001, to sort out weird oslo dependencies
* 17:09 chasemp: adding firewall to elasticsearch2[4-6] (3 was just done as a pilot)
* 17:03 andrewbogott: upgraded labnet1002 nova services to Juno
* 16:34 andrewbogott: stopping keystone, updating db, restarting
* 16:18 andrewbogott: switching labcontrol1001 hiera to Juno which will add the cloud-archive repo for Juno.
* 16:11 andrewbogott: backing up labs openstack databases into /home/andrew/openstackdbbackups on db1009
* 16:11 andrewbogott: starting labs openstack update to Juno
* 15:53 moritzm: ferm enabled on elastic1023
* 15:45 godog: repool restbase1009 in pybal
* 15:28 logmsgbot: thcipriani@tin Synchronized php-1.26wmf20/extensions/Wikidata: SWAT: Update Wikidata - wrap usage tracking batch updates in transaction [[gerrit:233970]] (duration: 00m 23s)
* 13:47 andrewbogott: rebooting/reimaging labnet1001
* 13:11 mobrovac: restbase deploying 1dfba85
* 12:54 yurik: git synced kartotherian
* 11:02 jynus: dropping optin_survey_old table on all wikis
* 10:33 godog: reenable puppet on ms-fe/ms-be, base::firewall still not enabled
* 09:58 godog: test-reboot ms-be2001
* 08:17 godog: disable puppet on ms-be/ms-fe in preparation for merging firewall changes
* 07:53 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Aug 26 07:53:31 UTC 2015 (duration 53m 30s)
* 07:01 jynus: restarting mw1239 HHVM, which is unresponsive
* 04:47 logmsgbot: ori@tin Synchronized wmf-config: I73721936: Enable ParsoidBatchAPI everywhere (duration: 00m 13s)
* 03:11 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-26 03:11:29+00:00
* 03:06 logmsgbot: awight@tin Synchronized wmf-config/InitialiseSettings-labs.php: Push labs config to keep in sync with master (duration: 00m 13s)
* 03:05 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 10m 45s)
* 02:37 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf19) at 2015-08-26 02:37:51+00:00
* 02:34 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 29s)
* 02:00 ottomata: kafka topic webrequest_upload has finished rebalancing across new brokers.  starting move of last topic webrequest_text
* 01:50 logmsgbot: mattflaschen@tin Synchronized php-1.26wmf19/extensions/Flow/: Sync Flow for reply fix (duration: 00m 15s)
* 00:28 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: (no message) (duration: 00m 13s)
* 00:26 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: (no message) (duration: 00m 13s)
* 00:26 Danny_B: 2586dd1c7c obviously broke many pages
* 00:19 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: (no message) (duration: 00m 14s)
* 00:14 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: I79ffa78fa: Collection/OCG: Turn on plain text output format in Book Creator (duration: 00m 12s)
* 00:12 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: 2586dd1c7c: Updated mediawiki/core Project: mediawiki/extensions/Scribunto (duration: 00m 13s)


== 2015-08-25 ==
== 2023-03-17 ==
* 23:39 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233860/ (duration: 00m 12s)
* 19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s)
* 23:16 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233872/ (duration: 00m 13s)
* 19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching
* 23:13 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/232963/ (duration: 00m 12s)
* 19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space
* 23:12 logmsgbot: krenair@tin Synchronized wmf-config/extension-list: https://gerrit.wikimedia.org/r/#/c/232963/ (duration: 00m 12s)
* 19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s)
* 23:10 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/232962/ (duration: 00m 12s)
* 19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch
* 23:10 logmsgbot: krenair@tin Synchronized wmf-config/extension-list: https://gerrit.wikimedia.org/r/#/c/232962/ (duration: 00m 12s)
* 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 23:05 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233781/ (duration: 00m 12s)
* 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 22:20 cscott: updated Parsoid to version c3b037b0
* 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 22:10 ejegg: disabled paypal audit downloader and parser due to them warning of incorrect data
* 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 21:16 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/AbuseFilter: I15f5b5b6 & I9c23b607 (duration: 00m 13s)
* 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 21:13 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/Cite/modules/ext.cite.styles.css: 7344e02216: Updated mediawiki/core Project: mediawiki/extensions/Cite (duration: 00m 12s)
* 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 21:09 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/AbuseFilter: I15f5b5b6 & I9c23b607 (duration: 00m 14s)
* 18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 20:54 tgr: finished OAuth migration
* 18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 20:34 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings.php: make OAuth DB writable again T108648 (duration: 00m 12s)
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 20:32 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings.php: change wgMWOAuthCentralWiki mediawikiwiki -> metawiki T108648 (duration: 00m 12s)
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 20:24 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings.php: set OAuth to readonly for DB migration T108648 (duration: 00m 13s)
* 17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 20:13 subbu: deployed parsoid version 759916fc
* 17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 19:24 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf20
* 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
* 19:21 logmsgbot: twentyafterfour@tin Finished scap: testwiki to 1.26wmf20 (duration: 50m 12s)
* 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
* 18:31 logmsgbot: twentyafterfour@tin Started scap: testwiki to 1.26wmf20
* 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:11 YuviPanda: run authdns-update on radon (ns0.wikimedia.org)
* 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:10 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 16:58 Krinkle: mwscript deleteEqualMessages.php --wiki kawiki
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 16:56 andrewbogott: restarting pdns on labcontrol1001 and labcontrol2001 to handle a nembus reboot
* 15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:53 Krinkle: mwscript deleteEqualMessages.php --wiki huwiki
* 15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 16:31 Krinkle: mwscript deleteEqualMessages.php --wiki frwiki
* 15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:17 Krinkle: mwscript deleteEqualMessages.php --wiki frpwiki
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 15:50 godog: powercycle ms-be1004, likely xfs
* 14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 15:44 andrewbogott: dist-upgrade and rebooting nembus in an attempt to resolve this acpi_pad issue
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 15:36 Krinkle: mwscript deleteEqualMessages.php --wiki euwiki (T45917)
* 14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 15:29 Krinkle: mwscript deleteEqualMessages.php --wiki eowiki (T45917)
* 14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 15:07 logmsgbot: krenair@tin Synchronized php-1.26wmf19/extensions/Flow: https://gerrit.wikimedia.org/r/#/c/233718/ (duration: 00m 16s)
* 14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:56 jynus: dropping old tables on s7 - T5493
* 14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:48 jynus: dropping old tables on s6 - T54932
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 12:53 Jeff_Green: authdns-update to change bismuth's IP
* 13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 11:16 jynus: dropping old tables on s3 - T54932
* 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 10:46 jynus: dropping old tables on s2 - T54932
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 10:05 YuviPanda: restart puppetmaster on labcontrol1001 for https://gerrit.wikimedia.org/r/#/c/233184/
* 13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 07:35 _joe_: stopping redis, wiping aof, restarting redis on rdb100{1,2} - snapshot saved on rdb1002:/root
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 07:12 _joe_: stopping redis on rdb1003,4, wiping AOF, restarting
* 13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 06:38 jynus: performing schema change on officewiki, mediawikiwiki and metawiki
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 02:21 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 26s)
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 01:48 ottomata: starting move of kafka partitions for topic webrequest_upload to new brokers. this will take a while!
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 01:44 ottomata: restarting kafka on new brokers kafka1013,1014,1020 to apply increase in num.replica.fetchers
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet
* 13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - [[phab:T332119|T332119]]
* 12:06 mutante: systemct-reset failed on gitlab-runner*
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json
* 02:10 ejegg: civicrm upgraded from {{Gerrit|672950d9}} to {{Gerrit|5dd37c9c}}
* 01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet
* 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
* 00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates


== 2015-08-24 ==
== 2023-03-16 ==
* 23:46 logmsgbot: mattflaschen@tin Synchronized wmf-config: Remove wgFlowOccupyPages (duration: 00m 12s)
* 23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 23:38 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233636/ (duration: 00m 12s)
* 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 22:16 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings-labs.php: change OAuth DB on beta +enable writes (duration: 00m 12s)
* 23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 21:55 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings-labs.php: set beta OAuth to readonly (duration: 00m 13s)
* 23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 21:54 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings-labs.php: set beta OAuth to readonly (duration: 00m 13s)
* 23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye
* 21:42 akosiaris: enabled puppet on maps-test200{1,2,3,4}.codfw.wmnet
* 23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye
* 20:21 arlolra: updated Parsoid to version 0b2fbae7
* 23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s)
* 18:58 bblack: reloading primary LVS pybals for BlankPage change ( https://gerrit.wikimedia.org/r/#/c/233053/ ) + ulimit fixup ( https://gerrit.wikimedia.org/r/#/c/233484/ )
* 23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0
* 18:31 bblack: reloading backup LVS pybals for BlankPage change ( https://gerrit.wikimedia.org/r/#/c/233053/ )
* 23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 17:19 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 16:23 logmsgbot: bd808@tin Purged l10n cache for 1.26wmf18
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 16:23 logmsgbot: bd808@tin Purged l10n cache for 1.26wmf17
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 16:05 andrewbogott: rebooting labnet1001
* 23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye
* 15:53 _joe_: restarted nutcracker on mw1010, holding a 150 GB deleted logfile
* 23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye
* 15:47 Krenair: running sync-common on mw1010 to bring it up to date after clearing some space
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet
* 15:44 logmsgbot: krenair@tin Purged l10n cache for 1.26wmf16
* 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet
* 15:41 logmsgbot: krenair@tin Purged l10n cache for 1.26wmf15
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors
* 15:38 logmsgbot: krenair@tin Synchronized php-1.26wmf19/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/233411/1 (duration: 00m 49s)
* 22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors
* 15:37 hashar: stopped and restarted Zuul
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:31 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/232919/ and https://gerrit.wikimedia.org/r/#/c/232915/ (duration: 01m 34s)
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 15:29 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/knwikiquote.png: https://gerrit.wikimedia.org/r/#/c/232919/ (duration: 02m 04s)
* 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 15:19 Krenair: No space left on mw1010, cannot ping or ssh to mw2180
* 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 logmsgbot: krenair@tin Synchronized docroot/noc/db.php: https://gerrit.wikimedia.org/r/#/c/232920/ (duration: 01m 34s)
* 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet
* 15:14 hashar: apt-get upgrade on gallium
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors
* 14:48 andrewbogott: forcing wikitech logouts in order to flush everyone’s service catalog
* 22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors
* 14:18 ottomata: starting to move kafka topic-partitions to new brokers (and off of analytics1021)
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:12 yurik: git deploy synced kartotherian
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 13:55 akosiaris: disable puppet on fermium preparing for reinstallation
* 22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 13:55 akosiaris: disable puppet on fermium
* 22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 12:54 akosiaris: stop etcd on etcd1002.eqiad.wmnet. Already removed from the cluster
* 22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet
* 11:58 _joe_: stopping etcd on etcd1001
* 22:24 ejegg: civicrm upgraded from {{Gerrit|68fa85cf}} to {{Gerrit|672950d9}}
* 11:50 _joe_: restarting etcd on etcd1001
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 09:00 YuviPanda: starting up replicate for tools on labstore1002
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 09:00 YuviPanda: cleaning up lockdir on labstore for maps and tools
* 22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 09:00 YuviPanda: others replication on labstore1002 completed successfuly
* 21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 08:31 YuviPanda: cleaned up others lockdir for replication on labstore1002 and started it manually
* 20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 06:43 jynus: reloading dbproxy1003 service
* 20:36 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): blockers hopefully resolved, rolling to all wikis
* 02:21 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 36s)
* 20:35 TheresNoTime: close UTC late backport window
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] (duration: 08m 18s)
* 20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 20:26 samtar@deploy2002: Started scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]]
* 20:21 brennen@deploy2002: Finished scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] (duration: 09m 06s)
* 20:14 brennen@deploy2002: brennen and jforrester: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:12 brennen@deploy2002: Started scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]]
* 19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s)
* 19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided)
* 18:41 wfan: enable monthlyconvert for cz
* 18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s)
* 18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided)
* 18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet
* 18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
* 18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
* 17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s)
* 16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade.
* 16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
* 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
* 16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back
* 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done]
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957
* 15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
* 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts
* 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:06 urandom: ALTER-ing image_suggestions.suggestion table — [[phab:T328670|T328670]]
* 13:35 kostajh: UTC afternoon deploys done
* 13:34 kharlan@deploy2002: Finished scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] (duration: 07m 44s)
* 13:28 kharlan@deploy2002: kharlan: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]]
* 13:15 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] (duration: 09m 48s)
* 13:07 kharlan@deploy2002: kharlan: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:05 kharlan@deploy2002: Started scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]]
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
* 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
* 11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
* 11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
* 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install
* 11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install
* 11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
* 10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet
* 08:40 kostajh: UTC morning deploys (second round) done
* 08:40 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] (duration: 12m 30s)
* 08:29 kharlan@deploy2002: kharlan: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]]
* 08:11 apergos: additional deployments for the  UTC morning backport and config training window, running into the next hour, so window re-opened
* 07:36 tgr_: UTC morning deploys done
* 07:34 tgr@deploy2002: Finished scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] (duration: 08m 13s)
* 07:28 tgr@deploy2002: tgr: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:26 tgr@deploy2002: Started scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]]
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json
* 06:03 marostegui: Failover m5 from db1106 to db1176 - [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 03:29 ejegg: payments-wiki upgraded from {{Gerrit|1532b107}} to {{Gerrit|0fd66b1f}}


== 2015-08-23 ==
== 2023-03-15 ==
* 16:54 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 22:55 tzatziki: Removing 1 file for legal compliance
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 23s)
* 22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 55s)
* 22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 28s)
* 22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s)
* 22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str
* 21:59 brennen: end of phabricator update window ([[phab:T331915|T331915]])
* 21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 40s)
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 28s)
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]]) (duration: 00m 52s)
* 21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]])
* 21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s)
* 21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893]
* 21:13 mutante: phab* - upgrading PHP packages
* 21:13 mutante: phabricator - maintenance window starting - expect possible downtime
* 21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]]) (duration: 00m 31s)
* 20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]])
* 20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002
* 20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s)
* 20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor
* 20:48 TheresNoTime: close UTC late backport window
* 20:48 samtar@deploy2002: Finished scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] (duration: 08m 46s)
* 20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:39 samtar@deploy2002: Started scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]]
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] (duration: 10m 30s)
* 20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye
* 20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:25 samtar@deploy2002: Started scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]]
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] (duration: 10m 12s)
* 20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye
* 20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye
* 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 20:15 samtar@deploy2002: sgimeno and samtar: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]]
* 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s)
* 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries
* 20:11 taavi: deploy patch for [[phab:T331192|T331192]]
* 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye
* 19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 19:49 taavi@deploy2002: Finished scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] (duration: 12m 04s)
* 19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye
* 19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye
* 19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye
* 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 19:39 taavi@deploy2002: taavi: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:37 taavi@deploy2002: Started scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]]
* 19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye
* 19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye
* 19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye
* 19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye
* 19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye
* 19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye
* 18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. ([[phab:T332115|T332115]])
* 18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye
* 18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]] (duration: 06m 08s)
* 18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye
* 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s)
* 18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries
* 18:06 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group1.
* 18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet
* 17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye
* 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye
* 17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye
* 16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 16:02 hnowlan: restarted thumbor-instances on thumbor1006
* 16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye
* 15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 14:54 Emperor: depool moss-fe1001 as rate of token denial is too high
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 14:53 claime: Redeploying mw-on-k8s for php7.4 update [[phab:T330270|T330270]]
* 14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:41 cgoubert@deploy2002: Started scap: (no justification provided)
* 14:41 claime: Rebuilding mw-on-k8s images - [[phab:T330270|T330270]]
* 14:38 claime: Updating php7.4 production images
* 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:24 daniel@deploy2002: Finished scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] (duration: 09m 57s)
* 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki
* 14:22 jbond: switch pki to be active active
* 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:19 jbond: update pki to use discovery record
* 14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki
* 14:15 daniel@deploy2002: daniel: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye
* 14:14 daniel@deploy2002: Started scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]]
* 14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:12 sukhe: depool dns4002 for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:00 moritzm: nodejs security updates on buster
* 13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: [[phab:T321309|T321309]]
* 13:49 moritzm: installing graphite-web security updates
* 13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:17 taavi@deploy2002: Finished scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] (duration: 09m 01s)
* 13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu
* 13:08 taavi@deploy2002: Started scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]]
* 13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:18 marostegui: Failover m5 from db1176 to db1106 - [[phab:T331877|T331877]]
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
* 11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 11:20 moritzm: imported packages into thirdparty/ceph-quincy
* 11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - [[phab:T331268|T331268]]/25
* 10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
* 09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates
* 09:22 moritzm: installing gnutls28 security updates
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T331875|T331875]]', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json
* 08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]] (duration: 00m 19s)
* 08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]]
* 08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 07:40 tgr_: UTC morning deploys done
* 07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet
* 07:36 tgr@deploy2002: Finished scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] (duration: 07m 54s)
* 07:30 tgr@deploy2002: tgr: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:28 tgr@deploy2002: Started scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]]
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json
* 06:20 marostegui: Remove pki2001 from m1 grants [[phab:T332018|T332018]]


== 2015-08-22 ==
== 2023-03-14 ==
* 23:08 logmsgbot: krenair@tin Synchronized php-1.26wmf19/extensions/AbuseFilter/maintenance/addMissingLoggingEntries.php: (no message) (duration: 01m 05s)
* 23:29 brennen@deploy2002: Finished scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] (duration: 10m 32s)
* 19:41 YuviPanda: manually remove old snapshots from labstore1002
* 23:20 brennen@deploy2002: brennen and umherirrender: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 17:28 chasemp: tweaking apache on iridum T109941
* 23:19 brennen@deploy2002: Started scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]]
* 16:45 chasemp: scratch that as we have mpm_prefork enabled :)
* 22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 16:33 chasemp: raising values in mpm_worker.conf for iridium to to debug and hopefully head off further crashing
* 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 14:44 twentyafterfour: restarted apache2 on iridium. Segfault again. This time I at least got one clue in the log: "zend_mm_heap corrupted"
* 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 09:18 twentyafterfour: phabricator seems stable now, restarting apache2 on iridium did the trick, unfortunately we didn't learn why
* 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 08:36 twentyafterfour: restarted phd on iridium
* 22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 08:36 twentyafterfour: restarted apache2 on iridium
* 21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 09s)
* 21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 00:26 mutante: deleting blog.sh and blog_pageviews crontab from stat1003
* 21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 20:43 ejegg: payments-wiki upgraded from {{Gerrit|61c30a4f}} to {{Gerrit|1532b107}}
* 20:35 zabe@deploy2002: Finished scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] (duration: 08m 36s)
* 20:28 zabe@deploy2002: zabe: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:27 zabe@deploy2002: Started scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]]
* 20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version [[phab:T327919|T327919]]
* 19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 19:30 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): uneventful at group0.  i'm afk for about an hour.
* 19:13 ejegg: civicrm upgraded from {{Gerrit|dbe3b716}} to {{Gerrit|68fa85cf}}
* 18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye
* 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
* 18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply