You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
imported>Stashbot
(pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2158.codfw.wmnet with OS bullseye)
(280 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2021-08-27 ==
== 2022-06-30 ==
* 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2158.codfw.wmnet with OS bullseye
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:34 eileen: civicrm upgraded from {{Gerrit|3cb5e6dd}} to {{Gerrit|f48fe112}}
* 00:44 legoktm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/PageTriage/: Revert backbone.js and underscore.js updates ([[phab:T289825|T289825]]) (duration: 01m 06s)
* 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage
* 01:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage
* 01:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2158.codfw.wmnet with reason: host reimage
* 01:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2158.codfw.wmnet with reason: host reimage
* 01:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS bullseye
* 00:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2155.codfw.wmnet with OS bullseye
* 00:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2158.codfw.wmnet with OS bullseye
* 00:49 ebernhardson: [[phab:T310924|T310924]] Cleared eqiad chi->omega cross cluster settings and reapplied
* 00:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2157.codfw.wmnet with OS bullseye
* 00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2157.codfw.wmnet with reason: host reimage
* 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2157.codfw.wmnet with reason: host reimage


== 2021-08-26 ==
== 2022-06-29 ==
* 22:06 legoktm: restarted mailman3-web on lists1001 ([[phab:T289798|T289798]])
* 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bullseye
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:55 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2154.codfw.wmnet with OS bullseye
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2157.codfw.wmnet with OS bullseye
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.20
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2154.codfw.wmnet with OS bullseye
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db2155.codfw.wmnet with OS bullseye
* 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 23:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host stat1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:30 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to pickup swift-s3 plugin - bking@cumin1001 - [[phab:T309648|T309648]]
* 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66717bc039f40336144dcc0dfd97ff5331b418e9}}: Install Extension Quiz on ja.wikibooks ([[phab:T289383|T289383]]) (duration: 01m 05s)
* 23:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: host reimage
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2155.codfw.wmnet with reason: host reimage
* 18:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bullseye
* 18:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 22:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cde88918b73628f2eaaff919ddb869b4dc2c93c6}}: Install Extension Quiz on fa.wikibooks ([[phab:T289381|T289381]]) (duration: 01m 07s)
* 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1477.eqiad.wmnet with OS buster
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2154.codfw.wmnet with OS bullseye
* 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d4340e9c18468d14885c8ced87f1e014a3481f2a}}: Finalize Event Platform migration of EchoEmail and EchoInteraction ([[phab:T287210|T287210]]) (duration: 01m 07s)
* 22:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1475.eqiad.wmnet with OS buster
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1496.eqiad.wmnet with OS buster
* 17:30 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 05s)
* 21:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2154.codfw.wmnet with reason: host reimage
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1477.eqiad.wmnet with reason: host reimage
* 17:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
* 21:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2154.codfw.wmnet with reason: host reimage
* 17:26 dancy@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: Backport: [[gerrit:714864{{!}}PageStore: Pass query flags to getPageById() too (T289717 T195069)]] (duration: 01m 05s)
* 21:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1477.eqiad.wmnet with reason: host reimage
* 16:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 21:42 cjming: end of UTC late backport window
* 16:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 21:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS buster
* 15:56 sukhe: ran homer for Gerrit 715007: Set up BGP peering to durum1001 in eqiad
* 21:41 cjming@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/DiscussionTools/modules/dt.ui.NewTopicController.less: Backport: [[gerrit:809691{{!}}New topic hint: Add clear:both (T311597)]] (duration: 03m 27s)
* 15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:24 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=plwiki --prune --batch-size=10 --sleep=2 ([[phab:T289249|T289249]])
* 21:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 21:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 21:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1475.eqiad.wmnet with reason: host reimage
* 13:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 21:37 cjming@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/DiscussionTools/modules/dt.ui.NewTopicController.less: Backport: [[gerrit:809690{{!}}New topic hint: Add clear:both (T311597)]] (duration: 03m 24s)
* 12:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mw1477.eqiad.wmnet with OS buster
* 12:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 21:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1477.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 21:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1496.eqiad.wmnet with reason: host reimage
* 12:21 sukhe: running puppet initial run on durum1001.eqiad.wmnet - [[phab:T289536|T289536]]
* 21:35 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS bullseye
* 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1025.eqiad.wmnet with OS buster
* 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:34 cjming@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/DiscussionTools/modules/NewTopicController.js: Backport: [[gerrit:809689{{!}}New topic hint: Avoid error about section editing when opened from diff (T311665)]] (duration: 03m 43s)
* 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:40 Lucas_WMDE: EU backport+config window done
* 21:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1475.eqiad.wmnet with reason: host reimage
* 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1496.eqiad.wmnet with reason: host reimage
* 11:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714853{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 04s)
* 21:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714854{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 05s)
* 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1001.eqiad.wmnet
* 21:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1001.eqiad.wmnet
* 21:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2154.codfw.wmnet with OS bullseye
* 11:20 nikerabbit@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:714770{{!}}Rename wgTranslateBlacklist to wgTranslateDisabledTargetLanguages]] (duration: 01m 05s)
* 21:30 cjming@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/DiscussionTools/modules/NewTopicController.js: Backport: [[gerrit:809688{{!}}New topic hint: Avoid error about section editing when opened from diff (T311665)]] (duration: 03m 35s)
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:09 vgutierrez: rolling restart of varnishkafka-statsv - [[phab:T289618|T289618]]
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:07 vgutierrez: disable puppet on cp-text to merge {{Gerrit|I52cf2a573980e33487d1f05f19b192ae7d13d717}} - [[phab:T286038|T286038]]
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 21:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2153.codfw.wmnet with OS bullseye
* 10:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 21:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mw1477.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 21:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mw1475.eqiad.wmnet with OS buster
* 09:30 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 21:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mw1496.eqiad.wmnet with OS buster
* 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
* 21:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:21 elukey: elukey@kafka-main1001:~$ kafka acls --add --allow-principal User:CN=varnishkafka --producer --topic statsv - [[phab:T286038|T286038]]
* 21:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1496.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:21 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
* 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:20 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:17 elukey: restart varnishkafka-statsv on cp4032 to pick up TLS settings
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
* 21:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 09:13 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
* 21:15 cjming@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:809671{{!}}Stop setting wgBabelCentralApi (T257079)]] (duration: 03m 30s)
* 09:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
* 21:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:52 vgutierrez: restart varnishkafka-statsv on cp4032
* 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 21:11 cjming@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:809666{{!}}Stop setting wgCentralAuthAutoNew (T257079)]] (duration: 03m 28s)
* 06:48 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 [[phab:T288273|T288273]]', diff saved to https://phabricator.wikimedia.org/P17085 and previous config saved to /var/cache/conftool/dbconfig/20210826-064655-marostegui.json
* 21:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mw1496.mgmt.eqiad.wmnet with reboot policy FORCED
* 06:43 marostegui: Reimage s4 eqiad master (db1138),  expect lag on eqiad [[phab:T288803|T288803]]
* 21:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 06:37 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:06 cjming@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CirrusSearch/includes/MetaStore/MetaSaneitizeJobStore.php: Backport: [[gerrit:809564{{!}}metastore: Remove versioning from saneitize updates]] (duration: 03m 35s)
* 06:33 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 21:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2153.codfw.wmnet with reason: host reimage
* 21:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw1496.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2153.codfw.wmnet with reason: host reimage
* 21:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
* 20:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
* 20:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 20:54 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
* 20:54 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mw1496.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 20:48 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1007.eqiad.wmnet with OS buster
* 20:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS buster
* 20:45 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:809634{{!}}[wmf-config]: Deploy GDI Survey Wave 2 on ES,FR,PT wikis. (T311643)]] (duration: 03m 25s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS buster
* 20:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2153.codfw.wmnet with OS bullseye
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS buster
* 20:37 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 20:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2159.mgmt.codfw.wmnet with reboot policy FORCED
* 20:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2160.mgmt.codfw.wmnet with reboot policy FORCED
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 cjming@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/TargetInitializer.js: Backport: [[gerrit:809550{{!}}Structured task: Add 'cancel' to the list of allowed commands (T311467)]] (duration: 03m 37s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bullseye
* 20:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bullseye
* 20:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bullseye
* 20:21 mutante: restarting docker on all 6 gitlab-runners via cumin [[phab:T311241|T311241]]
* 20:16 mutante: LDAP - mwmaint1002 - added demon to wmf group ([[phab:T311661|T311661]])
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bullseye
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 cjming@deploy1002: Synchronized dblists/desktop-improvements.dblist: Config: [[gerrit:809620{{!}}Add jawiki, zhwikinews to pilot wikis (T311419)]] (duration: 03m 23s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2160.mgmt.codfw.wmnet with reboot policy FORCED
* 20:07 cjming@deploy1002: Synchronized wmf-config/config: Config: [[gerrit:809620{{!}}Add jawiki, zhwikinews to pilot wikis (T311419)]] (duration: 03m 31s)
* 20:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
* 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
* 20:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2159.mgmt.codfw.wmnet with reboot policy FORCED
* 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
* 20:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 20:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2158.mgmt.codfw.wmnet with reboot policy FORCED
* 20:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2157.mgmt.codfw.wmnet with reboot policy FORCED
* 20:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
* 19:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 19:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 19:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bullseye
* 19:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bullseye
* 19:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bullseye
* 19:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2157.mgmt.codfw.wmnet with reboot policy FORCED
* 19:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2158.mgmt.codfw.wmnet with reboot policy FORCED
* 19:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bullseye
* 19:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:33 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 19:32 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 19:32 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 19:32 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 19:31 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 19:31 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 19:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2155.mgmt.codfw.wmnet with reboot policy FORCED
* 19:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:15 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:11 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2156.mgmt.codfw.wmnet with reboot policy FORCED
* 19:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2155.mgmt.codfw.wmnet with reboot policy FORCED
* 18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2153.mgmt.codfw.wmnet with reboot policy FORCED
* 18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2154.mgmt.codfw.wmnet with reboot policy FORCED
* 18:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2154.mgmt.codfw.wmnet with reboot policy FORCED
* 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1049.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1051.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1048.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1052.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1053.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1050.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2153.mgmt.codfw.wmnet with reboot policy FORCED
* 18:27 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to pickup swift-s3 plugin - bking@cumin1001 - [[phab:T309648|T309648]]
* 18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:24 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P30628 and previous config saved to /var/cache/conftool/dbconfig/20220629-181438-ladsgroup.json
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:12 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1050.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1053.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1052.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1048.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1049.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1051.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:11 dduvall@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.18  refs [[phab:T308071|T308071]] (duration: 03m 35s)
* 18:09 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:07 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.18  refs [[phab:T308071|T308071]]
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 17:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 17:31 sukhe: running puppet agent on centrallog2002 to finalize [[phab:T310574|T310574]]
* 17:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 17:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P30627 and previous config saved to /var/cache/conftool/dbconfig/20220629-172127-ladsgroup.json
* 17:19 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:18 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P30626 and previous config saved to /var/cache/conftool/dbconfig/20220629-170622-ladsgroup.json
* 17:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:00 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 17:00 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 16:59 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 16:59 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 16:58 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 16:58 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P30625 and previous config saved to /var/cache/conftool/dbconfig/20220629-165117-ladsgroup.json
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P30624 and previous config saved to /var/cache/conftool/dbconfig/20220629-163612-ladsgroup.json
* 16:22 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart to pickup swift-s3 plugin - bking@cumin1001 - [[phab:T309648|T309648]]
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:51 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808941{{!}}Increase weights on the language selector statement boosts (T307869)]] (expected to be a no-op) (duration: 03m 21s)
* 15:25 sukhe: upload anycast-healthchecker 0.8.2-1wm1 to apt.wm.o (bullseye) - [[phab:T310574|T310574]]
* 14:49 dancy@deploy1002: rebuilt and synchronized wikiversions files: Debugging
* 14:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1032.eqiad.wmnet with OS buster
* 14:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1033.eqiad.wmnet with OS buster
* 14:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1029.eqiad.wmnet with OS buster
* 14:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1034.eqiad.wmnet with OS buster
* 14:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1030.eqiad.wmnet with OS buster
* 14:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1028.eqiad.wmnet with OS buster
* 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
* 14:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS buster
* 14:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host stat1010.eqiad.wmnet with OS bullseye
* 14:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
* 14:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
* 14:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
* 14:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
* 14:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
* 14:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
* 14:04 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1010.eqiad.wmnet with OS bullseye
* 14:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
* 14:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
* 14:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
* 14:01 sukhe: sudo cumin -b 1 -s 5 'A:wikidough' 'run-puppet-agent -q'
* 14:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
* 14:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
* 14:00 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1031.eqiad.wmnet with OS buster
* 13:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS buster
* 13:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS buster
* 13:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS buster
* 13:54 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS buster
* 13:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS buster
* 13:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS buster
* 13:50 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:50 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS buster
* 13:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS buster
* 13:47 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
* 13:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
* 13:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|78fe6a15}}: {{Gerrit|9f76648}}: {{Gerrit|897e69c7}}: {{Gerrit|977e57b}}: DiscussionTools config changes ([[phab:T310960|T310960]], [[phab:T298221|T298221]], [[phab:T311023|T311023]]) (duration: 03m 38s)
* 13:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
* 13:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS buster
* 13:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
* 13:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2081.codfw.wmnet
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart to pickup swift-s3 plugin - bking@cumin1001 - [[phab:T309648|T309648]]
* 13:12 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2081.codfw.wmnet
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2081 from dbctl [[phab:T311475|T311475]]', diff saved to https://phabricator.wikimedia.org/P30622 and previous config saved to /var/cache/conftool/dbconfig/20220629-130741-marostegui.json
* 13:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS buster
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS buster
* 13:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS buster
* 13:02 sukhe: sudo cumin -d 'P<nowiki>{</nowiki>R:Class = bird<nowiki>}</nowiki>' 'disable-puppet "PLEASE DO NOT enable Puppet: deploying [[phab:T310574|T310574]]"'
* 12:54 otto@deploy1002: Finished deploy [analytics/refinery@2f5987d]: (no justification provided) (duration: 00m 37s)
* 12:53 otto@deploy1002: Started deploy [analytics/refinery@2f5987d]: (no justification provided)
* 12:49 otto@deploy1002: Finished deploy [analytics/refinery@2f5987d]: (no justification provided) (duration: 02m 00s)
* 12:47 otto@deploy1002: Started deploy [analytics/refinery@2f5987d]: (no justification provided)
* 12:47 otto@deploy1002: Finished deploy [analytics/refinery@2f5987d]: (no justification provided) (duration: 00m 03s)
* 12:47 otto@deploy1002: Started deploy [analytics/refinery@2f5987d]: (no justification provided)
* 12:34 mforns@deploy1002: Finished deploy [analytics/refinery@2f5987d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f5987d] (duration: 07m 32s)
* 12:27 mforns@deploy1002: Started deploy [analytics/refinery@2f5987d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f5987d]
* 12:26 mforns@deploy1002: Finished deploy [analytics/refinery@2f5987d] (thin): Regular analytics weekly train THIN [analytics/refinery@2f5987d] (duration: 00m 07s)
* 12:26 mforns@deploy1002: Started deploy [analytics/refinery@2f5987d] (thin): Regular analytics weekly train THIN [analytics/refinery@2f5987d]
* 12:25 mforns@deploy1002: Finished deploy [analytics/refinery@2f5987d]: Regular analytics weekly train [analytics/refinery@2f5987d] (duration: 01m 08s)
* 12:24 mforns@deploy1002: Started deploy [analytics/refinery@2f5987d]: Regular analytics weekly train [analytics/refinery@2f5987d]
* 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P30621 and previous config saved to /var/cache/conftool/dbconfig/20220629-121722-ladsgroup.json
* 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P30620 and previous config saved to /var/cache/conftool/dbconfig/20220629-120217-ladsgroup.json
* 11:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudnet1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:50 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudnet1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudservices1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudrabbit1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudrabbit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudrabbit1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P30619 and previous config saved to /var/cache/conftool/dbconfig/20220629-114712-ladsgroup.json
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30618 and previous config saved to /var/cache/conftool/dbconfig/20220629-114411-root.json
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P30617 and previous config saved to /var/cache/conftool/dbconfig/20220629-113207-ladsgroup.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30616 and previous config saved to /var/cache/conftool/dbconfig/20220629-112907-root.json
* 11:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudrabbit1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudrabbit1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudservices1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudrabbit1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P30615 and previous config saved to /var/cache/conftool/dbconfig/20220629-112054-ladsgroup.json
* 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30614 and previous config saved to /var/cache/conftool/dbconfig/20220629-111403-root.json
* 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P30613 and previous config saved to /var/cache/conftool/dbconfig/20220629-110210-ladsgroup.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30612 and previous config saved to /var/cache/conftool/dbconfig/20220629-105859-root.json
* 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P30610 and previous config saved to /var/cache/conftool/dbconfig/20220629-104705-ladsgroup.json
* 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P30608 and previous config saved to /var/cache/conftool/dbconfig/20220629-103200-ladsgroup.json
* 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P30607 and previous config saved to /var/cache/conftool/dbconfig/20220629-101655-ladsgroup.json
* 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P30606 and previous config saved to /var/cache/conftool/dbconfig/20220629-100341-ladsgroup.json
* 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
* 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
* 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 with some weight to get it warmed up', diff saved to https://phabricator.wikimedia.org/P30605 and previous config saved to /var/cache/conftool/dbconfig/20220629-093826-root.json
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 for on-site maintenance [[phab:T310595|T310595]]', diff saved to https://phabricator.wikimedia.org/P30603 and previous config saved to /var/cache/conftool/dbconfig/20220629-090120-root.json
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
* 08:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
* 08:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5a583804}}: Add GEMentorProvider to configuration ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:54 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db2075.codfw.wmnet
* 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1d1b9cf}}: Remove wgGEMentorDashboardBetaMode (duration: 03m 34s)
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:46 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 07:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|143c3fd}}: {{Gerrit|d5afd97}}: Remove unused GEHomepageSuggestedEditsRequiresOptIn and GEHomepageSuggestedEditsTopicsRequiresOptIn ([[phab:T308209|T308209]], [[phab:T308208|T308208]]) (duration: 03m 22s)
* 07:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2075.codfw.wmnet
* 07:40 marostegui: dbmaint s5@codfw [[phab:T311475|T311475]]
* 07:40 marostegui: dbmaint s@codfw [[phab:T311475|T311475]]
* 07:40 marostegui: dbmaint s1@codfw [[phab:T311475|T311475]]
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2075 from dbctl [[phab:T311591|T311591]]', diff saved to https://phabricator.wikimedia.org/P30602 and previous config saved to /var/cache/conftool/dbconfig/20220629-073919-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2075 [[phab:T311591|T311591]]', diff saved to https://phabricator.wikimedia.org/P30601 and previous config saved to /var/cache/conftool/dbconfig/20220629-073722-root.json
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1002.eqiad.wmnet
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2071 from dbctl', diff saved to https://phabricator.wikimedia.org/P30600 and previous config saved to /var/cache/conftool/dbconfig/20220629-072753-marostegui.json
* 07:24 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts webperf1002.eqiad.wmnet
* 07:17 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db2071.codfw.wmnet
* 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:10 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2071.codfw.wmnet
* 07:05 XioNoX: re-enabled bgp to telia in eqsin
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071 [[phab:T311589|T311589]]', diff saved to https://phabricator.wikimedia.org/P30598 and previous config saved to /var/cache/conftool/dbconfig/20220629-065804-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30597 and previous config saved to /var/cache/conftool/dbconfig/20220629-064655-root.json
* 06:04 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart to pickup swift-s3 plugin - ryankemper@cumin1001 - [[phab:T309648|T309648]]
* 06:02 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart to pickup swift-s3 plugin - ryankemper@cumin1001 - [[phab:T309648|T309648]]
* 05:56 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart to pickup swift-s3 plugin - ryankemper@cumin1001 - [[phab:T309648|T309648]]
* 04:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:37 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgCentralAuthTokenCacheType -> mcrouter [[phab:T278392|T278392]] (duration: 03m 44s)
* 04:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart to pickup swift-s3 plugin - ryankemper@cumin1001 - [[phab:T309648|T309648]]
* 00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2003-dev.codfw.wmnet with OS bullseye


== 2021-08-25 ==
== 2022-06-28 ==
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2003-dev.codfw.wmnet with reason: host reimage
* 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug
* 23:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2003-dev.codfw.wmnet with reason: host reimage
* 23:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:20 cjming@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/VisualEditor/modules/ve-mw/preinit: Backport: [[gerrit:809308{{!}}Do not grey out page title while loading on Vector 2022 (T310839)]] (duration: 03m 28s)
* 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2003-dev.codfw.wmnet with OS bullseye
* 23:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 22:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 22:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 22:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P30596 and previous config saved to /var/cache/conftool/dbconfig/20220628-213806-ladsgroup.json
* 21:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 21:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P30595 and previous config saved to /var/cache/conftool/dbconfig/20220628-213735-ladsgroup.json
* 21:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 21:30 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 21:25 cjming: end of UTC late backport window
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P30594 and previous config saved to /var/cache/conftool/dbconfig/20220628-212230-ladsgroup.json
* 21:20 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808056{{!}}Enable title above tabs on group 1 and group 0 wikis (1/2) (T310054)]] (duration: 03m 34s)
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:16 cjming@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/VisualEditor: Backport: [[gerrit:809249{{!}}Prevent skinStyles from applying to the Vector 2022 skin. (T310197)]] (duration: 03m 38s)
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:12 cjming@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/VisualEditor: Backport: [[gerrit:809249{{!}}Prevent skinStyles from applying to the Vector 2022 skin. (T310197)]] (duration: 03m 33s)
* 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 21:07 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest2001.codfw.wmnet
* 21:07 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P30592 and previous config saved to /var/cache/conftool/dbconfig/20220628-210725-ladsgroup.json
* 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:04 cjming@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/VisualEditor: Backport: [[gerrit:809245{{!}
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:32 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 17:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 17:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1
* 16:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:00 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-


== 2021-08-24 ==
== 2022-06-21 ==
* 22:05 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 20:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b42e57d75ec6b0536493fa073805a0bcb066aef1}}: zhwikiquote: Disable local upload ([[phab:T311017|T311017]]) (duration: 03m 43s)
* 22:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:10 tgr: running extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php on various wikis per [[phab:T282873|T282873]]#7303828
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a6fd96b15e6e3c068c2faac60208b9722d32af0f}}: Growth features: Promote 9 wikis out of dark mode ([[phab:T287871|T287871]]; [[phab:T287874|T287874]]; [[phab:T287872|T287872]]; [[phab:T287880|T287880]]; [[phab:T287868|T287868]]; [[phab:T287873|T287873]]; [[phab:T287879|T287879]]; [[phab:T287875|T287875]]; [[phab:T287876|T287876]]) (duration: 01m 25s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:22 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (2/2) (duration: 03m 28s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:35 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.17 (duration: 01m 48s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.18 (duration: 03m 26s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.20
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (1/2) (duration: 03m 30s)
* 20:18 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.20 (duration: 36m 32s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f70e302e11756d9704acc86c45b3d7aabf31c4d}}: fawiktionary: Enable SandboxLink extension ([[phab:T308505|T308505]]) (duration: 03m 37s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:41 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.20
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:23 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:19 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:38 dancy@deploy1002: backport aborted:  (duration: 00m 10s)
* 17:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:38 dancy@deploy1002: Installation of scap version "4.9.5" completed for 558 hosts
* 15:26 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics (duration: 02m 17s)
* 19:38 dancy@deploy1002: Installing scap version "4.9.5" for 558 hosts
* 15:23 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics
* 19:22 urandom: replicating Cassandra `system_auth` keyspace to codfw -- [[phab:T307641|T307641]]
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:56 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2` failed due to syntax error, patch here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/807200
* 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:48 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2`
* 14:55 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1001.wikimedia.org
* 14:54 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp1001.wikimedia.org
* 14:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2001.wikimedia.org
* 14:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:12 XioNoX: push pfw policies - [[phab:T289353|T289353]]
* 17:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host elastic1049.eqiad.wmnet
* 12:45 vgutierrez: enable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 17:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 12:37 vgutierrez: disable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 17:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp2001.wikimedia.org
* 12:33 godog: test patched python3-eventlet on thanos-fe1003 - [[phab:T283714|T283714]]
* 17:14 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 12:30 marostegui: Install 10.4.21 on clouddb1015
* 17:09 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1049.eqiad.wmnet
* 11:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 17:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 11:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 17:01 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 09:08 jbond: upload new statograph version
* 16:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 08:54 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=dewiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 16:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
* 08:51 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=arwiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 15:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
* 08:01 godog: temp fix thanos-swift.discovery.wmnet in /etc/hosts to get swift-dispersion-stats to work - [[phab:T283714|T283714]]
* 15:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 07:51 dcausse: repool wdqs1012 [[phab:T289551|T289551]]
* 15:55 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2048.codfw.wmnet
* 07:29 dcausse: restarting blazegraph on wdqs1012
* 15:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
* 07:17 marostegui: Optimize huwiki.flaggedtemplates on db1127
* 15:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
* 07:15 marostegui: Optimize huwiki.flaggedtemplates on db1098:3317
* 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 06:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 15:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 06:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 15:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (2/2) (duration: 03m 28s)
* 03:51 rzl: rzl@wdqs1012:~$ sudo depool
* 15:37 klausman: restarting pybal on lvs2009
* 03:46 legoktm: wdqs1012 restarted prometheus-blazegraph-exporter-wdqs-blazegraph.service and prometheus-blazegraph-exporter-wdqs-categories.service after apparent exceptions/crashes
* 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (1/2) (duration: 03m 51s)
* 00:17 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@da9efa9]: 0.3.83 (duration: 07m 05s)
* 15:30 klausman: Restarting pybal on lvs2010
* 00:10 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.83` on canary `wdqs1003`; proceeding to rest of fleet
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:09 ryankemper@deploy1002: Started deploy [wdqs/wdqs@da9efa9]: 0.3.83
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 00:08 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.83`. Pre-deploy tests passing on canary `wdqs1003`
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2002.codfw.wmnet
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2001.codfw.wmnet
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging-ctrl2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:16 klausman@cumin1001: conftool action : help; selector: name=ml-staging2001
* 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:06 moritzm: installing avahi security updates
* 15:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:01 papaul: PDU swap for rack a2 complete
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:24 papaul: on going maintenance on ps1-a2-codfw
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 13:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
* 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
* 13:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
* 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
* 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:28 daniel@deploy1002: Synchronized rpc/: Config: [[gerrit:805775{{!}}rpc: Remove unused RunJobs.php (T175146 T243096)]] (duration: 03m 45s)
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
* 13:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 13:05 moritzm: installing Linux 5.10.120-1~bpo10+1 on buster hosts with backports kernel
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 13:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
* 12:56 moritzm: installing haproxy security updates on stretch
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 12:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 12:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
* 12:43 moritzm: installing python-bottle security updates
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 12:25 moritzm: reset logster-csp/logster-badpass-priv on mwlog1002, these were removed from Puppet
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:59 mbsantos: mbsantos@maps2009 imposm-removebackup-import ([[phab:T305845|T305845]])
* 11:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:43 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for testing', diff saved to https://phabricator.wikimedia.org/P29936 and previous config saved to /var/cache/conftool/dbconfig/20220621-114232-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for testing', diff saved to https://phabricator.wikimedia.org/P29935 and previous config saved to /var/cache/conftool/dbconfig/20220621-114216-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for testing', diff saved to https://phabricator.wikimedia.org/P29934 and previous config saved to /var/cache/conftool/dbconfig/20220621-114151-root.json
* 10:57 volans: deleting netbox getstats.GetDeviceStats job results - [[phab:T311048|T311048]]
* 10:51 kart_: Updated cxserver to 2022-06-21-035954-production ([[phab:T307970|T307970]])
* 10:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 10:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 10:47 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 10:47 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 10:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 10:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:31 urbanecm: 09:29:23 Synchronized wmf-config/throttle.php: {{Gerrit|7c9f6a561b2b4b5c5db063bad83bd23e9cbac347}}: Add a throttle rule for a Czech course ([[phab:T310885|T310885]]) (duration: 03m 34s) #manually logging in logmsgbot's absence
* 09:20 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 09:13 marostegui: dbmaint s8@codfw [[phab:T310011|T310011]]
* 08:29 marostegui: Reboot db1120 for kernel upgrade
* 08:14 moritzm: remove EOLed parsoid debs from releases.wikimedia.org [[phab:T309765|T309765]]
* 05:54 marostegui: Reboot db1132 and db1181 for kernel upgrade


== 2021-08-23 ==
== 2022-06-20 ==
* 23:41 ryankemper: [[phab:T285355|T285355]] `helmfile -e staging -i apply` on `/srv/deployment-charts/helmfile.d/services/linkrecommendation/` from `ryankemper@deploy1002`
* 07:14 SandraEbele: Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics).
* 23:40 ryankemper@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 07:14 SandraEbele: killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs.
* 18:56 tgr: morning deploys done
* 18:56 tgr@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments: Backport: [[gerrit:714158{{!}}Add Link: store when tasks were generated (T284551)]] (duration: 00m 57s)
* 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:27 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713907{{!}}wmfSetupEtcd only supports array input]] (duration: 00m 57s)
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 dancy@deploy1002: Synchronized wmf-config: Config: [[gerrit:713906{{!}}Use array format to specify etcd server]] (duration: 00m 57s)
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713704{{!}}Allow protocol for etcd server to be specified]] (duration: 00m 57s)
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:17 ebernhardson@deploy1002: Finished deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow) (duration: 00m 56s)
* 17:16 ebernhardson@deploy1002: Started deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow)
* 16:37 ebernhardson@deploy1002: Finished deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration (duration: 00m 35s)
* 16:37 ebernhardson@deploy1002: Started deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration
* 16:24 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 3/3) (duration: 00m 56s)
* 14:58 urbanecm@deploy1002: Synchronized wmf-config/config/ckbwiki.yaml: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 2/3) (duration: 00m 57s)
* 14:57 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 1/3) (duration: 00m 57s)
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=ckbwiki --phab=[[phab:T287867|T287867]] # [[phab:T287867|T287867]]
* 14:53 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=ckbwiki growthexperiments # [[phab:T287867|T287867]]
* 14:29 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 14:00 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 13:57 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:56 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:55 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:713619{{!}}ProductionServices: change rdb* servers in eqiad and codfw (T280582)]] (duration: 00m 57s)
* 11:35 Lucas_WMDE: EU backport+config window done
* 11:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (2/2) (duration: 00m 57s)
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (1/2) (duration: 00m 58s)
* 11:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:04 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:713860{{!}}Revert "Enable NewUserMessage on hiwiktionary" (T287091)]] (duration: 00m 57s)
* 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 10:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 09:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714152{{!}}Add extra sleep option between each batch in pruneRevData.php (T289249)]] (duration: 00m 58s)
* 09:55 mbsantos: start re-import OSM planet data into maps1009 eqiad master ([[phab:T288400|T288400]], [[phab:T288897|T288897]])
* 09:53 urbanecm: Deploy security patch for [[phab:T289408|T289408]]
* 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 09:01 godog: pooling swift in eqiad - [[phab:T288458|T288458]]
* 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714322{{!}}Set request languages rdf output for wikidata to true (T285795)]] (duration: 00m 57s)
* 07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:28 Amir1: running FlaggedRevs/maintenance/pruneRevData.php on all flaggedrevs wikis
* 07:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714151{{!}}Avoid calling delete() with empty arrays in PruneFRIncludeData (T289249)]] (duration: 00m 59s)
* 07:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
* 07:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE


== 2021-08-21 ==
== 2022-06-19 ==
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 10:14 ayounsi@cumin1001: dbctl commit (dc=all): 'depool', diff saved to https://phabricator.wikimedia.org/P29910 and previous config saved to /var/cache/conftool/dbconfig/20220619-101436-ayounsi.json


== 2021-08-20 ==
== 2022-06-17 ==
* 23:17 legoktm: deployed patch for [[phab:T289385|T289385]]
* 22:05 AndyRussG: update payments-wiki revision {{Gerrit|10304f69}} -> {{Gerrit|ef53c82e}}
* 17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1141.eqiad.wmnet
* 20:22 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P29908 and previous config saved to /var/cache/conftool/dbconfig/20220617-202240-jynus.json
* 17:01 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1141.eqiad.wmnet
* 20:20 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P29907 and previous config saved to /var/cache/conftool/dbconfig/20220617-202038-jynus.json
* 16:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1140.eqiad.wmnet
* 17:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS buster
* 16:56 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1140.eqiad.wmnet
* 17:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1139.eqiad.wmnet
* 17:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 16:54 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1139.eqiad.wmnet
* 16:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS buster
* 16:45 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1134.eqiad.wmnet
* 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 16:43 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1134.eqiad.wmnet
* 16:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS buster
* 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1133.eqiad.wmnet
* 16:37 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 16:36 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1133.eqiad.wmnet
* 16:35 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:37 jayme: deleting various pods from staging to have them recreated with priorities - [[phab:T289131|T289131]]
* 16:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 15:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1129.eqiad.wmnet
* 16:34 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 15:23 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1129.eqiad.wmnet
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 15:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 14:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
* 16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
* 16:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:54 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 13:48 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:22 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 12:00 jayme: enabled priority admission plugin on k8s staging, rolling restart all pods in kube-system namespace - [[phab:T289131|T289131]]
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 11:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 10:35 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1001.eqiad.wmnet
* 16:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 09:32 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:06 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 09:23 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1001.eqiad.wmnet
* 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
* 08:48 godog: roll depool/pool thanos-fe to apply swift change - [[phab:T288815|T288815]]
* 16:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 08:43 godog: temp depool thanos-fe2003 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/713815
* 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 08:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
* 15:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 08:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
* 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 07:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
* 15:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 07:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 15:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 07:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 15:46 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS buster
* 07:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 07:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
* 15:39 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 07:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 15:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 06:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 06:07 TimStarling: sending election email to 44k people
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 03:15 legoktm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Score/scripts/removeTagline.php: removeTagline: Set explicit pcre.backtrack_limit ([[phab:T289298|T289298]]) (duration: 00m 58s)
* 15:32 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS buster
* 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 00:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/makeMailingList.php: code that uses said hack (duration: 00m 57s)
* 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 00:12 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/includes/User/LocalAuth.php: hack for mailout (duration: 00m 58s)
* 15:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:18 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:15 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS buster
* 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 15:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 15:02 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 14:59 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 14:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 14:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 12:35 SandraEbele: deployed daily airflow dag for 3 Wikidata metrics.
* 11:54 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@18182aa]: (no justification provided) (duration: 00m 13s)
* 11:54 ebysans@deploy1002: Started deploy [airflow-dags/analytics@18182aa]: (no justification provided)
* 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2012.codfw.wmnet
* 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2012.codfw.wmnet
* 11:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2011.codfw.wmnet
* 11:40 moritzm: upload cas 6.5.5+wmf11u1 to apt.wikimedia.org [[phab:T305518|T305518]]
* 11:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2011.codfw.wmnet
* 11:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2010.codfw.wmnet
* 11:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 11:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 11:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 11:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 11:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2010.codfw.wmnet
* 11:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
* 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
* 11:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1010.eqiad.wmnet
* 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 10:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 10:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 10:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 10:34 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
* 09:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
* 09:56 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 09:56 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 09:55 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 09:55 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 09:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
* 09:44 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
* 09:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
* 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1004.eqiad.wmnet
* 09:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
* 09:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
* 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
* 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
* 09:11 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 09:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 08:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 08:51 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 08:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 08:39 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 08:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
* 08:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
* 08:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
* 08:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
* 07:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 07:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 02:51 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
* 02:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:36 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 03m 43s)
* 02:02 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
* 01:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
* 01:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
* 00:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 00:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye


== 2021-08-19 ==
== 2022-06-16 ==
* 23:15 brennen: ended backport & config window early, as no patches were scheduled and no new attendees for this week
* 23:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 22:42 ejegg: updated payments-wiki from {{Gerrit|0a27dbe9b6}} to {{Gerrit|564daed816}}
* 23:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 21:20 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=huwiki --prune ([[phab:T289249|T289249]])
* 23:38 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:59 mutante: new Wikipedia languages added to DNS:  blk = https://en.wikipedia.org/wiki/Pa%27O_language  {{!}}  pcm = https://en.wikipedia.org/wiki/Nigerian_Pidgin
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.19
* 22:37 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:07 razzi@deploy1002: Finished deploy [analytics/aqs/deploy@57c253e]: Deploy aqs {{Gerrit|9c062f2}} (duration: 03m 30s)
* 22:33 volans@cumin2002: START - Cookbook sre.dns.netbox
* 19:03 razzi@deploy1002: Started deploy [analytics/aqs/deploy@57c253e]: Deploy aqs {{Gerrit|9c062f2}}
* 21:18 thcipriani@deploy1002: Finished scap: noop test (duration: 04m 07s)
* 18:27 razzi: Beginning aqs deploy process
* 21:14 thcipriani@deploy1002: Started scap: noop test
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon2001.codfw.wmnet
* 21:10 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:805433{{!}}CommonSettings: clean up and simplify some code]] (duration: 03m 42s)
* 17:49 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2001.codfw.wmnet
* 21:06 thcipriani@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:806249{{!}}MWRealm.php: remove unused getRealmSpecificFilename() (T171115)]] (duration: 03m 35s)
* 17:48 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1001.eqiad.wmnet
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1001.eqiad.wmnet
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1004.eqiad.wmnet
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:01 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1004.eqiad.wmnet
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1003.eqiad.wmnet
* 20:59 thcipriani@deploy1002: Finished scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]] (duration: 16m 57s)
* 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:49 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable Score with Shellbox on most public wikis ([[phab:T257066|T257066]]) (duration: 01m 08s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1003.eqiad.wmnet
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1002.eqiad.wmnet
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:31 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
* 20:42 thcipriani@deploy1002: Started scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]]
* 16:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts maps1002.eqiad.wmnet
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1001.eqiad.wmnet
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1001.eqiad.wmnet
* 20:27 cjming@deploy1002: Synchronized phpcs.xml: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 27s)
* 16:14 hnowlan: starting decommission of old eqiad maps hardware
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:10 cwhite: remove rotated logstash-plain-* and logstash-json-* logs on logstash collectors
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:00 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 dpifke@deploy1002: Finished deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again [[phab:T281243|T281243]] (duration: 00m 06s)
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:52 dpifke@deploy1002: Started deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again [[phab:T281243|T281243]]
* 20:23 cjming@deploy1002: Synchronized wmf-config/: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 22s)
* 15:50 Amir1: test2wiki)> delete from flaggedtemplates where ft_rev_id not in (select fp_stable from flaggedpages); ([[phab:T289249|T289249]])
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
* 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805179{{!}}Turn off TOC A/B test for pilot wikis (T309683)]] (duration: 03m 37s)
* 15:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
* 19:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner2001.codfw.wmnet
* 15:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:39 aokoth@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:25 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:23 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
* 19:03 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner2001.codfw.wmnet
* 15:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab-runner1001.eqiad.wmnet
* 15:04 godog: clean logstash json logs off logstash hosts
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:49 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 14:36 effie: enable puppet on mediawiki and memcached servers for 713842
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29904 and previous config saved to /var/cache/conftool/dbconfig/20220616-185520-marostegui.json
* 14:26 effie: disable puppet on mediawiki and memcached servers for 713842
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:49 urbanecm: Start server-side upload for 1 video file ([[phab:T288384|T288384]])
* 18:54 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 13:48 urbanecm: Start server-side upload for 1 video file ([[phab:T288554|T288554]])
* 18:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
* 13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:53 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:45 urbanecm: Start server-side upload for 1 video file ([[phab:T288628|T288628]])
* 18:50 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 18:49 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 urbanecm: Start server-side upload for 1 video file ([[phab:T289203|T289203]])
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:44 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to all wikis
* 13:34 kormat: reconfiguring replication tree on pc3 [[phab:T284825|T284825]]
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 kormat: reconfiguring replication tree on pc2 [[phab:T284825|T284825]]
* 18:42 brennen@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/CheckUser/src/Hooks.php: Backport: [[gerrit:806246{{!}}Only try to create User object if username is not null (T310747)]] (duration: 03m 23s)
* 13:24 kormat: reconfiguring replication tree on pc1 [[phab:T284825|T284825]]
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29903 and previous config saved to /var/cache/conftool/dbconfig/20220616-184015-marostegui.json
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29902 and previous config saved to /var/cache/conftool/dbconfig/20220616-182510-marostegui.json
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:13 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 13:09 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote new h/w to primary of eqiad pc sections [[phab:T284825|T284825]] (duration: 01m 08s)
* 18:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 12:35 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:12 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:11 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 12:11 Lucas_WMDE: EU backport+config window done
* 18:10 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 12:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: [[gerrit:713523{{!}}Update termbox (T236893, T286775)]] (duration: 01m 08s)
* 18:10 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 11:56 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29901 and previous config saved to /var/cache/conftool/dbconfig/20220616-181005-marostegui.json
* 11:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:10 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:59 brennen: end of phabricator deploy
* 11:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:713824{{!}}Revert "Don't set termbox v2 tags yet" (T236893, T286775)]] (duration: 01m 06s)
* 17:46 brennen: starting phabricator deploy, momentary downtime expected while services restart
* 11:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: [[gerrit:713513{{!}}Update termbox (T236893, T286775)]] (duration: 01m 08s)
* 17:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 11:39 lucaswerkmeister-wmde@deploy1002: sync-file aborted: Backport: [[gerrit:713513{{!}}Update termbox (T236893T286775)]] (duration: 00m 01s)
* 17:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29900 and previous config saved to /var/cache/conftool/dbconfig/20220616-173738-marostegui.json
* 11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:45 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:42 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:36 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:12 twentyafterfour: restart php-fpm on phab1001
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29899 and previous config saved to /var/cache/conftool/dbconfig/20220616-173725-marostegui.json
* 10:02 godog: roll-reload nginx on ms-fe to apply config change
* 17:31 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 08:48 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:31 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 08:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 08:41 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:27 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 04:20 effie: pool mw2383 - [[phab:T286463|T286463]]
* 17:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 01:13 ejegg: updated fundraising CiviCRM from {{Gerrit|73f6ec9190}} to {{Gerrit|8ed303f2d1}}
* 17:26 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 00:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29898 and previous config saved to /var/cache/conftool/dbconfig/20220616-172220-marostegui.json
* 00:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29897 and previous config saved to /var/cache/conftool/dbconfig/20220616-170715-marostegui.json
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29896 and previous config saved to /var/cache/conftool/dbconfig/20220616-165210-marostegui.json
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29895 and previous config saved to /var/cache/conftool/dbconfig/20220616-161844-marostegui.json
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29894 and previous config saved to /var/cache/conftool/dbconfig/20220616-161835-marostegui.json
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29893 and previous config saved to /var/cache/conftool/dbconfig/20220616-160330-marostegui.json
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29892 and previous config saved to /var/cache/conftool/dbconfig/20220616-154825-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29891 and previous config saved to /var/cache/conftool/dbconfig/20220616-153320-marostegui.json
* 15:31 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 15:30 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 15:29 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P29890 and previous config saved to /var/cache/conftool/dbconfig/20220616-151434-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P29889 and previous config saved to /var/cache/conftool/dbconfig/20220616-145931-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29888 and previous config saved to /var/cache/conftool/dbconfig/20220616-145136-marostegui.json
* 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29887 and previous config saved to /var/cache/conftool/dbconfig/20220616-145128-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P29886 and previous config saved to /var/cache/conftool/dbconfig/20220616-144427-ladsgroup.json
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29885 and previous config saved to /var/cache/conftool/dbconfig/20220616-143623-marostegui.json
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P29884 and previous config saved to /var/cache/conftool/dbconfig/20220616-142923-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29883 and previous config saved to /var/cache/conftool/dbconfig/20220616-142118-marostegui.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29882 and previous config saved to /var/cache/conftool/dbconfig/20220616-140613-marostegui.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29881 and previous config saved to /var/cache/conftool/dbconfig/20220616-140453-root.json
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 volans@cumin1001: dbctl commit (dc=all): 'Doesn't have new wikiuser', diff saved to https://phabricator.wikimedia.org/P29880 and previous config saved to /var/cache/conftool/dbconfig/20220616-140107-volans.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29879 and previous config saved to /var/cache/conftool/dbconfig/20220616-134950-root.json
* 13:45 sukhe: upload bird2_2.0.7-4.1wm1 to apt.wm.o (buster) - [[phab:T310574|T310574]]
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29878 and previous config saved to /var/cache/conftool/dbconfig/20220616-133446-root.json
* 13:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1089.eqiad.wmnet
* 13:22 jayme@cumin1001: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 13:21 jayme@cumin1001: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29877 and previous config saved to /var/cache/conftool/dbconfig/20220616-131942-root.json
* 13:10 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29876 and previous config saved to /var/cache/conftool/dbconfig/20220616-130438-root.json
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 13:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29875 and previous config saved to /var/cache/conftool/dbconfig/20220616-123357-marostegui.json
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1008.eqiad.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 for schema change', diff saved to https://phabricator.wikimedia.org/P29874 and previous config saved to /var/cache/conftool/dbconfig/20220616-115924-root.json
* 11:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1008.eqiad.wmnet
* 11:53 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet
* 11:45 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet
* 11:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
* 11:38 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
* 11:35 godog: trim swift logs older than 25d from centrallog hosts - [[phab:T309171|T309171]]
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet
* 11:27 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet
* 11:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
* 11:19 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29873 and previous config saved to /var/cache/conftool/dbconfig/20220616-111632-marostegui.json
* 11:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
* 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 11:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json
* 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
* 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
* 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
* 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json
* 10:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster
* 10:02 elukey: ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster
* 09:21 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json
* 09:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:02 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 08:45 moritzm: failover ganeti master in drmrs/2 to ganeti6004
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370{{!}}testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 joal: Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure


== 2021-08-18 ==
== 2022-06-15 ==
* 22:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping (duration: 02m 09s)
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json
* 22:14 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json
* 21:15 jgleeson: civicrm changed from {{Gerrit|66568246a2}} to {{Gerrit|73f6ec9190}}
* 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS buster
* 19:40 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping (duration: 02m 12s)
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29865 and previous config saved to /var/cache/conftool/dbconfig/20220615-221834-marostegui.json
* 19:38 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS buster
* 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
* 22:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
* 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:12 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 18:16 legoktm: Successfully published image docker-registry.discovery.wmnet/nodejs12-devel:0.0.1, docker-registry.discovery.wmnet/nodejs12-slim:0.0.1 ([[phab:T284346|T284346]])
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29864 and previous config saved to /var/cache/conftool/dbconfig/20220615-220329-marostegui.json
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|559dd701a5859223afd49aaa33ddab70e8ebe721}}: Enable page previews on German Wikivoyage ([[phab:T264305|T264305]]) (duration: 01m 08s)
* 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 18:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|35113b617b3540242ac69a8285c54c70041bc14b}}: Enable DiscussionTools topicsubscription as beta feature on phase 1 wikis ([[phab:T287800|T287800]]) (duration: 01m 25s)
* 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29863 and previous config saved to /var/cache/conftool/dbconfig/20220615-213241-marostegui.json
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 15:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 15:46 ejegg: updated matching gift employers list on payments-wiki
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29862 and previous config saved to /var/cache/conftool/dbconfig/20220615-213233-marostegui.json
* 15:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29861 and previous config saved to /var/cache/conftool/dbconfig/20220615-211728-marostegui.json
* 14:50 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29860 and previous config saved to /var/cache/conftool/dbconfig/20220615-210223-marostegui.json
* 14:26 effie: enable puppet on alert*
* 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29859 and previous config saved to /var/cache/conftool/dbconfig/20220615-204717-marostegui.json
* 14:11 effie: disable puppet on alerts* to avoid alert flood due to 713494
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:08 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804014{{!}}Remove unused setting wgQuickSurveysUseVue (T285890)]] (duration: 03m 38s)
* 13:57 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:713619{{!}}ProductionServices: change rdb* servers in eqiad and codfw (T280582)]] (duration: 01m 51s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 godog: bounce logstash on logstash100[89]
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:50 hashar@deploy1002: Finished deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]] (duration: 00m 10s)
* 13:24 effie: mw2383 is depooled - [[phab:T286463|T286463]]
* 19:50 hashar@deploy1002: Started deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]]
* 13:01 kormat: Deploying wmfmariadbpy 0.7.2 [[phab:T289139|T289139]]
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29858 and previous config saved to /var/cache/conftool/dbconfig/20220615-194703-marostegui.json
* 13:01 kormat: uploaded wmfmariadbpy 0.7.2 to apt.wm.o
* 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 11:38 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 11:36 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29857 and previous config saved to /var/cache/conftool/dbconfig/20220615-194655-marostegui.json
* 11:35 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29856 and previous config saved to /var/cache/conftool/dbconfig/20220615-193150-marostegui.json
* 11:12 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 19:31 hashar: wikibugs IRC bot has been restarted by valhallasw \o/ # [[phab:T310734|T310734]]
* 11:03 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29855 and previous config saved to /var/cache/conftool/dbconfig/20220615-191645-marostegui.json
* 10:47 effie: pooling mw2383 - [[phab:T286463|T286463]]
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29854 and previous config saved to /var/cache/conftool/dbconfig/20220615-190140-marostegui.json
* 10:41 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:42 hashar: wikibugs (irc bot for Phabricator/Gerrit) is no more working and would need a restart [[phab:T310734|T310734]]
* 10:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29853 and previous config saved to /var/cache/conftool/dbconfig/20220615-182140-marostegui.json
* 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 18:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 10:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 18:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 09:36 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618] (duration: 05m 48s)
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618]
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:30 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618] (duration: 00m 07s)
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618]
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]] (duration: 03m 43s)
* 09:29 joal@deploy1002: Finished deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618] (duration: 32m 29s)
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:57 joal@deploy1002: Started deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618]
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:38 marostegui: Drop user2 from s6 - [[phab:T289051|T289051]]
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:03 rzl@cumin2001: conftool action : get/pooled; selector: service=docker-registry
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 00:39 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again ([[phab:T281243|T281243]]) (duration: 00m 05s)
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:39 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again ([[phab:T281243|T281243]])
* 17:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1015.eqiad.wmnet with OS buster
* 00:38 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark ([[phab:T281243|T281243]]) (duration: 00m 06s)
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:38 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark ([[phab:T281243|T281243]])
* 17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:52 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1014.eqiad.wmnet with OS buster
* 17:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29851 and previous config saved to /var/cache/conftool/dbconfig/20220615-172738-marostegui.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29849 and previous config saved to /var/cache/conftool/dbconfig/20220615-171233-marostegui.json
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29848 and previous config saved to /var/cache/conftool/dbconfig/20220615-165727-marostegui.json
* 16:54 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to group0
* 16:44 jynus: reestarting replication for m3 on db1117, not db2078
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29847 and previous config saved to /var/cache/conftool/dbconfig/20220615-164222-marostegui.json
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:29 brennen: phabricator upgrade finished
* 16:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Id8cdb8aef70f6672}} (duration: 03m 41s)
* 16:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host backup1009.eqiad.wmnet
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host backup1009.eqiad.wmnet
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29845 and previous config saved to /var/cache/conftool/dbconfig/20220615-160838-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29844 and previous config saved to /var/cache/conftool/dbconfig/20220615-160830-marostegui.json
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS buster
* 15:56 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P29843 and previous config saved to /var/cache/conftool/dbconfig/20220615-155325-marostegui.json
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 15:40 mutante: phabricator upgrade in progress
* 15:39 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:39 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20220615-153820-marostegui.json
* 15:35 brennen: starting phabricator deploy, momentary downtime expected while Apache restarts and migrations run
* 15:34 jynus: stopping replication for m3 on db1117, db2078
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29841 and previous config saved to /var/cache/conftool/dbconfig/20220615-152315-marostegui.json
* 15:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ms-be1059.eqiad.wmnet with OS bullseye
* 15:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:20 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:06 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:05 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:05 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:03 mutante: phabricator maintenance about to start
* 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 14:59 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:52 jbond@cumin1001: END (ERROR) - Cookbook sre.pdus.uptime (exit_code=97)
* 14:51 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29840 and previous config saved to /var/cache/conftool/dbconfig/20220615-145028-marostegui.json
* 14:50 urandom: ALTER-ing replication for codfw (Cassandra) expansion -- [[phab:T307641|T307641]]
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29839 and previous config saved to /var/cache/conftool/dbconfig/20220615-145020-marostegui.json
* 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:49 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:46 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29838 and previous config saved to /var/cache/conftool/dbconfig/20220615-143515-marostegui.json
* 14:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:30 hnowlan@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T308670|T308670]] credentials to access the similar-users service (duration: 03m 32s)
* 14:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:22 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:21 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29836 and previous config saved to /var/cache/conftool/dbconfig/20220615-142010-marostegui.json
* 14:19 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 14:16 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS buster
* 14:10 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jnuche@deploy1002: Installation of scap version "4.9.4" completed for 558 hosts
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 14:08 jnuche@deploy1002: Installing scap version "4.9.4" for 558 hosts
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29834 and previous config saved to /var/cache/conftool/dbconfig/20220615-140505-marostegui.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:01 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:58 awight: EU afternoon backport window complete.
* 13:57 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/Translate/src/PageTranslation/DeleteTranslatableBundleSpecialPage.php: Backport: [[gerrit:805749{{!}}Fix deletion of translation pages outside of NS_MAIN namespace (T310440)]] (duration: 00m 32s)
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29833 and previous config saved to /var/cache/conftool/dbconfig/20220615-135508-root.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29832 and previous config saved to /var/cache/conftool/dbconfig/20220615-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29831 and previous config saved to /var/cache/conftool/dbconfig/20220615-135458-root.json
* 13:54 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:53 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:49 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29830 and previous config saved to /var/cache/conftool/dbconfig/20220615-134004-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29829 and previous config saved to /var/cache/conftool/dbconfig/20220615-133958-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29828 and previous config saved to /var/cache/conftool/dbconfig/20220615-133954-root.json
* 13:38 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:805745{{!}}Restore internal mechanism to use either back or close button (T310602)]] (duration: 00m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29827 and previous config saved to /var/cache/conftool/dbconfig/20220615-133334-marostegui.json
* 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29826 and previous config saved to /var/cache/conftool/dbconfig/20220615-133326-marostegui.json
* 13:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 01m 08s)
* 13:30 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 02m 06s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29825 and previous config saved to /var/cache/conftool/dbconfig/20220615-132500-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29824 and previous config saved to /var/cache/conftool/dbconfig/20220615-132454-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29823 and previous config saved to /var/cache/conftool/dbconfig/20220615-132450-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29822 and previous config saved to /var/cache/conftool/dbconfig/20220615-131820-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29821 and previous config saved to /var/cache/conftool/dbconfig/20220615-130956-root.json
* 13:09 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29820 and previous config saved to /var/cache/conftool/dbconfig/20220615-130951-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29819 and previous config saved to /var/cache/conftool/dbconfig/20220615-130946-root.json
* 13:08 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:04 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 43s)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29818 and previous config saved to /var/cache/conftool/dbconfig/20220615-130315-marostegui.json
* 13:02 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 12:56 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 58s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:55 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 05s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29817 and previous config saved to /var/cache/conftool/dbconfig/20220615-125452-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29816 and previous config saved to /var/cache/conftool/dbconfig/20220615-125447-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29815 and previous config saved to /var/cache/conftool/dbconfig/20220615-125442-root.json
* 12:51 jbond@deploy1002: Finished deploy [netbox/deploy@7bbf659]: log (duration: 03m 12s)
* 12:48 jbond@deploy1002: Started deploy [netbox/deploy@7bbf659]: log
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29813 and previous config saved to /var/cache/conftool/dbconfig/20220615-124810-marostegui.json
* 12:42 moritzm: failover ganeti master in eqsin to ganeti5001
* 12:42 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:42 volans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29812 and previous config saved to /var/cache/conftool/dbconfig/20220615-123949-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29811 and previous config saved to /var/cache/conftool/dbconfig/20220615-123943-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29810 and previous config saved to /var/cache/conftool/dbconfig/20220615-123938-root.json
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 12:25 kart_: Updated cxserver to 2022-06-15-074244-production ([[phab:T309266|T309266]], [[phab:T310116|T310116]], [[phab:T309384|T309384]], [[phab:T306963|T306963]])
* 12:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 12:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 es1033 es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29808 and previous config saved to /var/cache/conftool/dbconfig/20220615-122123-root.json
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 12:19 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29807 and previous config saved to /var/cache/conftool/dbconfig/20220615-121620-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29806 and previous config saved to /var/cache/conftool/dbconfig/20220615-121440-marostegui.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 12:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29805 and previous config saved to /var/cache/conftool/dbconfig/20220615-115935-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29804 and previous config saved to /var/cache/conftool/dbconfig/20220615-115452-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29803 and previous config saved to /var/cache/conftool/dbconfig/20220615-115135-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29802 and previous config saved to /var/cache/conftool/dbconfig/20220615-115127-root.json
* 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29801 and previous config saved to /var/cache/conftool/dbconfig/20220615-114950-marostegui.json
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29800 and previous config saved to /var/cache/conftool/dbconfig/20220615-114430-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29799 and previous config saved to /var/cache/conftool/dbconfig/20220615-113948-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29798 and previous config saved to /var/cache/conftool/dbconfig/20220615-113631-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29797 and previous config saved to /var/cache/conftool/dbconfig/20220615-113623-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29796 and previous config saved to /var/cache/conftool/dbconfig/20220615-113445-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29795 and previous config saved to /var/cache/conftool/dbconfig/20220615-112924-marostegui.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29794 and previous config saved to /var/cache/conftool/dbconfig/20220615-112444-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29793 and previous config saved to /var/cache/conftool/dbconfig/20220615-112127-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29792 and previous config saved to /var/cache/conftool/dbconfig/20220615-112119-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29791 and previous config saved to /var/cache/conftool/dbconfig/20220615-111940-marostegui.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29790 and previous config saved to /var/cache/conftool/dbconfig/20220615-110940-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29789 and previous config saved to /var/cache/conftool/dbconfig/20220615-110623-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29788 and previous config saved to /var/cache/conftool/dbconfig/20220615-110616-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29787 and previous config saved to /var/cache/conftool/dbconfig/20220615-110435-marostegui.json
* 10:55 marostegui: dbmaint es3@eqiad [[phab:T310485|T310485]]
* 10:55 marostegui: dbmaint es2@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui: dbmaint es1@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29786 and previous config saved to /var/cache/conftool/dbconfig/20220615-105437-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29784 and previous config saved to /var/cache/conftool/dbconfig/20220615-105119-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29783 and previous config saved to /var/cache/conftool/dbconfig/20220615-105112-root.json
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29782 and previous config saved to /var/cache/conftool/dbconfig/20220615-103933-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29781 and previous config saved to /var/cache/conftool/dbconfig/20220615-103615-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29780 and previous config saved to /var/cache/conftool/dbconfig/20220615-103608-root.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29779 and previous config saved to /var/cache/conftool/dbconfig/20220615-103101-marostegui.json
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29778 and previous config saved to /var/cache/conftool/dbconfig/20220615-103048-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1028 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29777 and previous config saved to /var/cache/conftool/dbconfig/20220615-102929-root.json
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29776 and previous config saved to /var/cache/conftool/dbconfig/20220615-101543-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29775 and previous config saved to /var/cache/conftool/dbconfig/20220615-100235-marostegui.json
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29774 and previous config saved to /var/cache/conftool/dbconfig/20220615-100037-marostegui.json
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29773 and previous config saved to /var/cache/conftool/dbconfig/20220615-094532-marostegui.json
* 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29772 and previous config saved to /var/cache/conftool/dbconfig/20220615-092706-marostegui.json
* 09:20 marostegui: Reboot sanitarium hosts (db1154, db1155) wiki replicas will have lag
* 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1059.eqiad.wmnet with OS bullseye
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29771 and previous config saved to /var/cache/conftool/dbconfig/20220615-091257-marostegui.json
* 09:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29770 and previous config saved to /var/cache/conftool/dbconfig/20220615-091249-marostegui.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29769 and previous config saved to /var/cache/conftool/dbconfig/20220615-091201-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29768 and previous config saved to /var/cache/conftool/dbconfig/20220615-085744-marostegui.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29767 and previous config saved to /var/cache/conftool/dbconfig/20220615-085656-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29766 and previous config saved to /var/cache/conftool/dbconfig/20220615-084239-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29765 and previous config saved to /var/cache/conftool/dbconfig/20220615-084151-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29764 and previous config saved to /var/cache/conftool/dbconfig/20220615-084046-marostegui.json
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29763 and previous config saved to /var/cache/conftool/dbconfig/20220615-083554-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29762 and previous config saved to /var/cache/conftool/dbconfig/20220615-082734-marostegui.json
* 08:23 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:22 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29761 and previous config saved to /var/cache/conftool/dbconfig/20220615-082050-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29760 and previous config saved to /var/cache/conftool/dbconfig/20220615-081744-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29759 and previous config saved to /var/cache/conftool/dbconfig/20220615-080546-root.json
* 08:03 XioNoX: re-enable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29758 and previous config saved to /var/cache/conftool/dbconfig/20220615-080240-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29757 and previous config saved to /var/cache/conftool/dbconfig/20220615-075042-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29756 and previous config saved to /var/cache/conftool/dbconfig/20220615-075024-marostegui.json
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29755 and previous config saved to /var/cache/conftool/dbconfig/20220615-074736-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29754 and previous config saved to /var/cache/conftool/dbconfig/20220615-073538-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29753 and previous config saved to /var/cache/conftool/dbconfig/20220615-073232-root.json
* 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29752 and previous config saved to /var/cache/conftool/dbconfig/20220615-072352-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P29751 and previous config saved to /var/cache/conftool/dbconfig/20220615-072034-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29750 and previous config saved to /var/cache/conftool/dbconfig/20220615-071728-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29749 and previous config saved to /var/cache/conftool/dbconfig/20220615-070847-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29748 and previous config saved to /var/cache/conftool/dbconfig/20220615-065342-marostegui.json
* 06:52 XioNoX: disable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29747 and previous config saved to /var/cache/conftool/dbconfig/20220615-063837-marostegui.json
* 06:02 marostegui: Reboot db[2071-2078] [[phab:T310485|T310485]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29746 and previous config saved to /var/cache/conftool/dbconfig/20220615-060153-marostegui.json
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29745 and previous config saved to /var/cache/conftool/dbconfig/20220615-054252-marostegui.json
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1173.eqiad.wmnet with OS bullseye
* 05:17 marostegui: dbmaint es5@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es4@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es3@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es2@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es1@codfw [[phab:T310485|T310485]]
* 05:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:03 marostegui: Reboot dbproxy1016 and dbproxy1021 [[phab:T310484|T310484]]
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 tstarling@deploy1002: Synchronized php-1.39.0-wmf.16/includes/cache/MessageCache.php: (no justification provided) (duration: 03m 36s)
* 02:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:17 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/includes/cache/MessageCache.php: [[phab:T310532|T310532]] (duration: 03m 29s)
* 02:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-08-17 ==
== 2022-06-14 ==
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:52 mutante: gitlab-runner1001/1002 - clean revert not possible, icinga alerting about failed buildkitd service, manually deleting systemd unit and trying to clean up [[phab:T308271|T308271]]
* 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:49 mutante: gitlab-runner1002 - systemctl restart docker; run-puppet-agent ; systemctl start buildkitd  - fails though [[phab:T308271|T308271]]
* 23:32 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php: [[phab:T288233|T288233]]: Work around cache failure for wikitech (duration: 01m 28s)
* 23:39 mutante: gitlab-runner1001 - systemctl start buildkitd
* 23:05 tzatziki: resetting email for vanished user
* 23:32 mutante: gitlab-runner1001 - restarting docker
* 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:08 mutante: disabling puppet in gitlab-runners (via cumin /disable-puppet) before deploying gerrit:791655 to provide gitlab-runners with buildkit and new docker network - [[phab:T308271|T308271]]
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:44 urbanecm: Deploy security patch for [[phab:T289063|T289063]]
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 brennen: running scap pull on mw2383
* 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:29 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.16 (duration: 02m 01s)
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.15 (duration: 06m 51s)
* 22:15 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|e3fe6c04c95717f0f914bbfa366f5f827f392b6b}}: phpcs: fix more SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 39s)
* 20:14 brennen: pruning 1.37.0-wmf.15 and .16 ([[phab:T281160|T281160]])
* 22:05 urbanecm@deploy1002: Synchronized w/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 18s)
* 20:06 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.18/includes/block/BlockUser.php: {{Gerrit|d377d4fae704640c81172a6fa94b12b2efdba42c}}: BlockUser: Restore blocking autoblocked IP addresses ([[phab:T287798|T287798]]) (duration: 01m 08s)
* 22:02 urbanecm@deploy1002: Synchronized src/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 32s)
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.19
* 22:00 mutante: wtp1026 - manually running '/usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807'
* 19:02 brennen: 1.37.0-wmf.19 train status: no current blockers, proceeding to group0 ([[phab:T281160|T281160]])
* 21:58 urbanecm@deploy1002: Synchronized rpc/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 31s)
* 17:44 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/includes/: Backport: [[gerrit:713506{{!}}Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998)]] (duration: 01m 13s)
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 urbanecm: [urbanecm@mw2383 ~]$ scap pull # to clear an icinga alert
* 21:54 urbanecm@deploy1002: Synchronized multiversion/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 29s)
* 17:39 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/: Backport: [[gerrit:713365{{!}}Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998)]] (duration: 01m 14s)
* 21:54 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 17:15 bblack: authdns2001,dns[245]001: upgrade gdnsd package to 3.8.0-1~wmf1 (all authdns upgraded after this)
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 21:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:04 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 21:51 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 38s)
* 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 21:49 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
* 16:56 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.19 (duration: 38m 24s)
* 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:50 bblack: dns1001: upgrade gdnsd package to 3.8.0-1~wmf1
* 21:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
* 16:25 bblack: dns3001: upgrade gdnsd package to 3.8.0-1~wmf1
* 21:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
* 16:17 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.19
* 21:38 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 16:13 brennen: 1.37.0-wmf.19 train: running scap prep, branched at {{Gerrit|79c9b9e61350b0edd1acccb5e717875ba64cf9c1}}
* 21:32 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 16:08 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 21:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 16:06 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 21:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 16:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 15:55 urbanecm: Deploy a security patch for [[phab:T289064|T289064]]
* 21:10 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 15:37 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:03 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 15:32 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:41 urbanecm@deploy1002: Synchronized docroot/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 41s)
* 14:37 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2013 to primary of pc3 [[phab:T284825|T284825]] (duration: 00m 58s)
* 20:37 urbanecm@deploy1002: Synchronized w/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 15s)
* 14:25 jynus: running a full testwiki media backup on a single thread, single worker [[phab:T262668|T262668]]
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:34 urbanecm@deploy1002: Synchronized multiversion/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 28s)
* 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:20 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2012 to primary of pc2 [[phab:T284825|T284825]] (duration: 00m 59s)
* 20:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 13:53 jynus: rolling restart of minio on backup server
* 20:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 13:51 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:31 urbanecm@deploy1002: Synchronized wmf-config/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 38s)
* 13:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1021.eqiad.wmnet with OS buster
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS buster
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1018.eqiad.wmnet with OS buster
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 20:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1017.eqiad.wmnet with OS buster
* 11:29 phuedx@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Jobs/TallyElectionJob.php: Backport: [[gerrit:713361{{!}}tallyElectionJob: Catch and log exceptions (T288361)]] (duration: 00m 58s)
* 19:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 11:16 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17038 and previous config saved to /var/cache/conftool/dbconfig/20210817-111629-mvernon.json
* 19:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 11:15 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 11:01 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17037 and previous config saved to /var/cache/conftool/dbconfig/20210817-110125-mvernon.json
* 19:32 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 10:46 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17035 and previous config saved to /var/cache/conftool/dbconfig/20210817-104622-mvernon.json
* 19:32 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 10:31 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17034 and previous config saved to /var/cache/conftool/dbconfig/20210817-103118-mvernon.json
* 19:16 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
* 10:07 effie: enable puppet on mediawiki hosts
* 19:10 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
* 09:52 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
* 18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 09:50 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 09:20 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 depooling: reimage to buster [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17033 and previous config saved to /var/cache/conftool/dbconfig/20210817-092045-mvernon.json
* 18:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 09:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1456.eqiad.wmnet
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 09:16 Emperor: reimaging db2121 to buster [[phab:T288244|T288244]]
* 18:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 09:08 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1456.eqiad.wmnet
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 08:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1276-1279].eqiad.wmnet
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 08:29 effie: disable puppet on mediawiki hosts to merge 712920
* 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1276-1279].eqiad.wmnet
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 08:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
* 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:21 mutante: mw2383 - scap pull (still depooled because [[phab:T286463|T286463]] but alerts in Icinga since a while)
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
* 18:15 ayounsi@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=imagescaler-ro,name=codfw
* 08:18 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:18 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw127[6-9].eqiad.wmnet
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:17 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw127[6-9].eqiad.wmnet
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad [[phab:T280203|T280203]]
* 18:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
* 08:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad [[phab:T280203|T280203]]
* 17:57 brennen@deploy1002: Pruned MediaWiki: 1.39.0-wmf.14 (duration: 01m 53s)
* 08:06 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:00 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw144[7-9].eqiad.wmnet
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.16 (duration: 32m 52s)
* 07:59 mutante: mw1384 - start failed ferm service
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:59 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw1450.eqiad.wmnet
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:52 mutante: mw1451 through mw1455 - fresh hardware pooled the first time as appservers
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw145[1-5].eqiad.wmnet
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw145[1-5].eqiad.wmnet
* 17:25 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
* 07:48 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw145[1-5].eqiad.wmnet
* 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:44 marostegui: Drop aft_feedback tables on x1 [[phab:T250715|T250715]]
* 17:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw1450.eqiad.wmnet
* 17:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[7-9].eqiad.wmnet
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:57 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Entities/Election.php: [[phab:T288924|T288924]] (duration: 00m 57s)
* 17:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.16
* 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:13 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): train is blocked - will sync to testwikis and hold there for resolution of [[phab:T310532|T310532]]
* 06:55 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/cli/dump.php: [[phab:T288924|T288924]] (duration: 00m 58s)
* 16:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2053.codfw.wmnet with OS bullseye
* 06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:59 TimStarling: foreachwikiindblist securepollglobal mysql.php --write -- -e 'insert into securepoll_properties (pr_entity,pr_key,pr_value) select el_entity,'\''mobile-jump-url'\'','\''https://vote.m.wikimedia.org/wiki/Special:SecurePoll'\'' from securepoll_elections where el_title='\''DWalden STV Election Test 456'\'' limit 1;'
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:37 tstarling@deploy1002: Finished scap: collected SecurePoll maintenance scripts and bug fix (duration: 04m 12s)
* 16:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
* 05:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
* 05:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:12 jnuche@deploy1002: Installation of scap version "4.9.2" completed for 557 hosts
* 05:33 tstarling@deploy1002: Started scap: collected SecurePoll maintenance scripts and bug fix
* 16:11 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 05:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:05 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 03:11 eileen: civicrm revision changed from {{Gerrit|175a3101f7}} to {{Gerrit|66568246a2}}, config revision is {{Gerrit|7bdc78073d}}
* 15:58 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2053.codfw.wmnet with OS bullseye
* 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host elastic2053.codfw.wmnet
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host elastic2053.codfw.wmnet
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
* 00:44 eileen: civicrm revision changed from {{Gerrit|ba0c7705bb}} to {{Gerrit|175a3101f7}}, config revision is {{Gerrit|7bdc78073d}}
* 14:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:53 moritzm: failover ganeti master in ulsfo to ganeti4003
* 00:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eccdd3ed3fda1abee9a4c57719afd0d1faae41c3}}: Growth mentor dashboard: Enable on testwiki ([[phab:T278920|T278920]]) (duration: 00m 59s)
* 14:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|596058b5e4d906d40e620fe5b01f37c484f5a8c1}}: Add new throttle rule + remove expired one ([[phab:T310625|T310625]]) (duration: 03m 38s)
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:33 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
* 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
* 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS buster
* 14:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS buster
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS buster
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS buster
* 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS buster
* 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 14:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS buster
* 14:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS buster
* 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS buster
* 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29741 and previous config saved to /var/cache/conftool/dbconfig/20220614-132654-marostegui.json
* 13:13 urbanecm: UTC afternoon B&C window done
* 13:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1692de09bf04c724cf416679405d4b6485550d40}}: Disable DiscussionTools visualenhancements feature in production (duration: 03m 25s)
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29740 and previous config saved to /var/cache/conftool/dbconfig/20220614-131149-marostegui.json
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f2dc7296f0c25d00e45651c50c3e45733cc63b3}}: Make new topic tool available as opt-out almost everywhere (phrase 4; [[phab:T310392|T310392]]) (duration: 03m 45s)
* 13:06 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29739 and previous config saved to /var/cache/conftool/dbconfig/20220614-125644-marostegui.json
* 12:56 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS buster
* 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS buster
* 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS buster
* 12:42 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS buster
* 12:41 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS buster
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29738 and previous config saved to /var/cache/conftool/dbconfig/20220614-124139-marostegui.json
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS buster
* 12:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS buster
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS buster
* 12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS buster
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29737 and previous config saved to /var/cache/conftool/dbconfig/20220614-120323-marostegui.json
* 12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29735 and previous config saved to /var/cache/conftool/dbconfig/20220614-115020-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P29734 and previous config saved to /var/cache/conftool/dbconfig/20220614-113515-marostegui.json
* 11:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1173.eqiad.wmnet with OS bullseye
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29732 and previous config saved to /var/cache/conftool/dbconfig/20220614-110945-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29731 and previous config saved to /var/cache/conftool/dbconfig/20220614-110504-marostegui.json
* 11:02 moritzm: rebalancing ganeti cluster in esams [[phab:T308238|T308238]]
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
* 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29730 and previous config saved to /var/cache/conftool/dbconfig/20220614-105441-root.json
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 10:44 joal@deploy1002: Finished deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency (duration: 00m 09s)
* 10:44 joal@deploy1002: Started deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29729 and previous config saved to /var/cache/conftool/dbconfig/20220614-104021-marostegui.json
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29728 and previous config saved to /var/cache/conftool/dbconfig/20220614-103937-root.json
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:30 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
* 10:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29727 and previous config saved to /var/cache/conftool/dbconfig/20220614-102433-root.json
* 10:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
* 10:19 marostegui: dbmaint s6@eqiad [[phab:T60674|T60674]]
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29726 and previous config saved to /var/cache/conftool/dbconfig/20220614-101755-marostegui.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29725 and previous config saved to /var/cache/conftool/dbconfig/20220614-100930-root.json
* 10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS buster
* 10:03 moritzm: rename Ganeti group row_A in test cluster to row_A-test
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29724 and previous config saved to /var/cache/conftool/dbconfig/20220614-100250-marostegui.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P29723 and previous config saved to /var/cache/conftool/dbconfig/20220614-094745-marostegui.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29722 and previous config saved to /var/cache/conftool/dbconfig/20220614-093240-marostegui.json
* 09:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:23 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29721 and previous config saved to /var/cache/conftool/dbconfig/20220614-092330-marostegui.json
* 09:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29720 and previous config saved to /var/cache/conftool/dbconfig/20220614-092322-marostegui.json
* 09:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 09:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 09:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1058.eqiad.wmnet with reason: host reimage
* 09:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 09:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
* 09:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
* 09:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 09:08 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29719 and previous config saved to /var/cache/conftool/dbconfig/20220614-090817-marostegui.json
* 09:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
* 09:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 09:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 09:01 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
* 09:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1058.eqiad.wmnet with OS bullseye
* 09:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 08:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
* 08:58 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
* 08:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1001.eqiad.wmnet
* 08:56 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
* 08:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS buster
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29718 and previous config saved to /var/cache/conftool/dbconfig/20220614-085312-marostegui.json
* 08:53 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63] (duration: 07m 27s)
* 08:51 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:49 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
* 08:48 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 08:48 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:47 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
* 08:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
* 08:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 08:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63]
* 08:45 joal@deploy1002: Finished deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63] (duration: 00m 08s)
* 08:44 joal@deploy1002: Started deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63]
* 08:44 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63] (duration: 04m 45s)
* 08:39 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63]
* 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
* 08:38 godog: reboot centrallog2002 - [[phab:T310483|T310483]]
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29717 and previous config saved to /var/cache/conftool/dbconfig/20220614-083807-marostegui.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29716 and previous config saved to /var/cache/conftool/dbconfig/20220614-082855-marostegui.json
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29715 and previous config saved to /var/cache/conftool/dbconfig/20220614-082847-marostegui.json
* 08:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:20 marostegui: dbmaint s6@eqiad [[phab:T298560|T298560]]
* 08:18 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
* 08:16 marostegui: dbmaint s6@eqiad [[phab:T309311|T309311]]
* 08:16 joal@deploy1002: Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63] (duration: 31m 09s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29714 and previous config saved to /var/cache/conftool/dbconfig/20220614-081342-marostegui.json
* 08:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS buster
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29713 and previous config saved to /var/cache/conftool/dbconfig/20220614-075837-marostegui.json
* 07:45 joal@deploy1002: Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29712 and previous config saved to /var/cache/conftool/dbconfig/20220614-074331-marostegui.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29711 and previous config saved to /var/cache/conftool/dbconfig/20220614-073322-marostegui.json
* 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 taavi: UTC morning deploys done
* 07:24 marostegui: dbmaint s6@eqiad [[phab:T298563|T298563]]
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804806{{!}}Enable Realtime Preview on cawiki, viwiki, and fawiki (T303961)]] (duration: 03m 20s)
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:802685{{!}}Update $wgVectorMaxWidthOptions to include action=edit (T307725)]] (duration: 03m 36s)
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 marostegui: dbmaint s6@eqiad [[phab:T300381|T300381]]
* 07:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for schema change', diff saved to https://phabricator.wikimedia.org/P29710 and previous config saved to /var/cache/conftool/dbconfig/20220614-065322-root.json
* 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:28 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] (duration: 03m 31s)
* 06:27 marostegui: Reboot dbproxy1012 and dbproxy1015 [[phab:T310484|T310484]]
* 06:24 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/includes/ServiceWiring.php: [[phab:T212129|T212129]] (duration: 03m 33s)
* 06:20 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/AbuseFilter/extension.json: [[phab:T212129|T212129]] (duration: 03m 32s)
* 06:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29709 and previous config saved to /var/cache/conftool/dbconfig/20220614-060608-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29707 and previous config saved to /var/cache/conftool/dbconfig/20220614-060155-root.json
* 06:01 marostegui: Starting s6 eqiad failover from db1173 to db1131 - [[phab:T300471|T300471]]
* 05:11 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212129|T212129]] Switch wgMainStash to db-mainstash (duration: 03m 38s)
* 05:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T300471|T300471]]', diff saved to https://phabricator.wikimedia.org/P29706 and previous config saved to /var/cache/conftool/dbconfig/20220614-045224-root.json
* 04:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 04:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T300471|T300471]]
* 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29705 and previous config saved to /var/cache/conftool/dbconfig/20220614-024047-ladsgroup.json
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29704 and previous config saved to /var/cache/conftool/dbconfig/20220614-022542-ladsgroup.json
* 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29703 and previous config saved to /var/cache/conftool/dbconfig/20220614-021037-ladsgroup.json
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P29702 and previous config saved to /var/cache/conftool/dbconfig/20220614-015532-ladsgroup.json
* 00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29701 and previous config saved to /var/cache/conftool/dbconfig/20220614-003608-marostegui.json
* 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29700 and previous config saved to /var/cache/conftool/dbconfig/20220614-002103-marostegui.json
* 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29699 and previous config saved to /var/cache/conftool/dbconfig/20220614-000558-marostegui.json


== 2021-08-16 ==
== 2022-06-13 ==
* 23:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29698 and previous config saved to /var/cache/conftool/dbconfig/20220613-235053-marostegui.json
* 23:20 urbanecm: Evening B&C window done
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a14868bbdf442eede5711576c4b4da51df0ccd77}}: Enable NewUserMessage on hiwiktionary ([[phab:T287091|T287091]]) (duration: 01m 00s)
* 23:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:15 eileen: civicrm revision changed from {{Gerrit|1e32084622}} to {{Gerrit|ba0c7705bb}}, config revision is {{Gerrit|7bdc78073d}}
* 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:13 bblack: dns[1235]002: upgrade gdnsd package to 3.8.0-1~wmf1
* 23:45 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 801836 remove variable wmgDbconfigFromEtcd (duration: 03m 26s)
* 21:31 bblack: authdns1001: upgrade gdnsd package to 3.8.0-1~wmf1
* 23:35 tstarling@deploy1002: Synchronized wmf-config/etcd.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 36s)
* 21:28 bblack: dns4002: upgrade gdnsd package to 3.8.0-1~wmf1
* 23:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 23:30 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T134809|T134809]] g 799685 codfw master DBs (duration: 03m 30s)
* 20:38 bstorm@cumin1001: Added views for new wiki: labswiki [[phab:T287442|T287442]]
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:37 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29697 and previous config saved to /var/cache/conftool/dbconfig/20220613-232537-marostegui.json
* 20:35 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 23:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:35 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 18:48 dancy: Restarted Jenkins due to stuck jobs.
* 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29696 and previous config saved to /var/cache/conftool/dbconfig/20220613-232529-marostegui.json
* 18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:16 mutante: gitlab-runner2001 - systemctl reset-failed to clear alert about failed ifup for ens14 which is actually up. race condiation caused by reboot
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 23:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29695 and previous config saved to /var/cache/conftool/dbconfig/20220613-231024-marostegui.json
* 17:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
* 22:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29694 and previous config saved to /var/cache/conftool/dbconfig/20220613-225519-marostegui.json
* 17:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
* 22:55 AndyRussG: payments-wiki upgraded from {{Gerrit|8c6208c2}} to {{Gerrit|10304f69}}
* 17:34 cmjohnson1: installing new line card in slot1 cr2-eqiad [[phab:T277339|T277339]]
* 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29693 and previous config saved to /var/cache/conftool/dbconfig/20220613-224014-marostegui.json
* 17:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:712965{{!}}Try to use EditStash before re-rendering (T288639)]] (duration: 00m 59s)
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29692 and previous config saved to /var/cache/conftool/dbconfig/20220613-221522-marostegui.json
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:25 XioNoX: cr1-eqiad> request chassis fpc offline slot 5 - [[phab:T277339|T277339]]
* 22:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 17:17 cmjohnson1: installing new line card in slot1 cr1-eqiad [[phab:T277339|T277339]]
* 22:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[2001-2004].codfw.wmnet with reason: maintenance reboot
* 17:11 ejegg: updated fundraising CiviCRM from {{Gerrit|f3895dc907}} to {{Gerrit|1e32084622}}
* 21:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 17:08 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port set pic-slot 1 member 8 port 1 - [[phab:T288834|T288834]]
* 21:56 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner[1001-1004].eqiad.wmnet with reason: maintenance reboot
* 17:05 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port delete pic-slot 1 member 8 port 1 - [[phab:T288834|T288834]]
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 16:37 cwhite: restart logstash on logstash1008
* 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:01 mutante: LDAP - added user tandic to nda group ([[phab:T288527|T288527]])
* 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29691 and previous config saved to /var/cache/conftool/dbconfig/20220613-215118-marostegui.json
* 15:37 ryankemper: [WDQS] Re-pooled `codfw`: `ryankemper@puppetmaster1001:~$ sudo -i confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=codfw' set/pooled=true`
* 21:48 mutante: gitlab-runner* - sequentially pausing, rebooting, resuming one by one
* 14:42 mutante: miscweb - deploying new microsite for Wikidata Query Builder subpage ([[phab:T266703|T266703]])
* 21:44 mutante: gitlab-runner1001 - pause from accepting jobs - rebooting
* 14:41 mutante: mw1455 - works fine after a reimage, unknown why it didnt last time, but ok :)
* 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29690 and previous config saved to /var/cache/conftool/dbconfig/20220613-213613-marostegui.json
* 14:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29689 and previous config saved to /var/cache/conftool/dbconfig/20220613-212108-marostegui.json
* 14:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
* 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29688 and previous config saved to /var/cache/conftool/dbconfig/20220613-210603-marostegui.json
* 13:53 mutante: mw1455 - mysteriously showing a bunch of issues in icinga, broken packages, envoy, memcached etc, after recent fresh install, trying another reimage ([[phab:T273915|T273915]])
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711515{{!}}Remove $wmgWikibaseFineGrainedLuaTracking (T288612)]] (duration: 00m 58s)
* 20:29 cjming: end of UTC late backport window
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:805206{{!}}Disable TOC A/B test for beta cluster (T309683)]] (duration: 03m 29s)
* 13:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 20:22 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 30s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:19 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
* 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711514{{!}}Stop setting $wgWBClientSettings['fineGrainedLuaTracking'] (T288612)]] (duration: 00m 58s)
* 20:18 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ug.svg: Config: [[gerrit:800857{{!}}ugwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 36s)
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:711513{{!}}Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612)]] (beta, 2/2) (duration: 00m 59s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711513{{!}}Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612)]] (prod, 1/2) (duration: 00m 59s)
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711512{{!}}Stop setting 'useTermsTableSearchFields' Wikibase option (T288612)]] (duration: 00m 59s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29687 and previous config saved to /var/cache/conftool/dbconfig/20220613-201420-marostegui.json
* 12:22 Lucas_WMDE: EU backport+config window done (slightly belatedly)
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 12:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Pages/VotePage.php: allow linking by title (duration: 00m 58s)
* 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 12:17 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29686 and previous config saved to /var/cache/conftool/dbconfig/20220613-201407-marostegui.json
* 12:15 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:712962{{!}}Support null content in parser tag hook (T288846)]] (hopefully also fixes [[phab:T288790|T288790]]) (duration: 00m 59s)
* 20:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
* 12:15 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 27s)
* 12:14 kormat: clean up old /root/.my.cnf files [[phab:T150446|T150446]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:712754{{!}}Add extendedconfirmed on zhwiki (T287322)]] + Config: [[gerrit:713255{{!}}Fix extendedconfirmed for bots on zhwiki (T287322)]] (duration: 01m 01s)
* 20:08 cjming@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-crh.svg: Config: [[gerrit:800856{{!}}crhwiki: Add localized mobile wordmark (T309431)]] (duration: 03m 16s)
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29685 and previous config saved to /var/cache/conftool/dbconfig/20220613-195902-marostegui.json
* 11:26 Lucas_WMDE: namespaceDupes.php for [[phab:T287024|T287024]] finished
* 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P29684 and previous config saved to /var/cache/conftool/dbconfig/20220613-194356-marostegui.json
* 11:22 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes.php hrwiki --fix --add-prefix=[[phab:T287024|T287024]]/ {{!}} tee [[phab:T287024|T287024]].out # [[phab:T287024|T287024]]
* 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29683 and previous config saved to /var/cache/conftool/dbconfig/20220613-192851-marostegui.json
* 11:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710564{{!}}Add namespace aliases for hr.wiki (T287024)]] (duration: 00m 59s)
* 19:12 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:11 mutante: etherpad - minimal downtime - rebooting etherpad1003
* 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 mutante: gerrit2002 - rebooting
* 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:04 mutante: gitlab2003 - rebooting
* 10:32 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:713225{{!}}Add tags for wikidata edits (T236893)]] (duration: 00m 58s)
* 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29682 and previous config saved to /var/cache/conftool/dbconfig/20220613-190314-marostegui.json
* 09:16 gehel: depooling wdqs codfw to allow catching up on lag
* 19:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 08:49 jynus: replacing s2 with s4 on db2097 [[phab:T287230|T287230]]
* 19:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 08:28 gehel: repool wdqs eqiad (`confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=true`) - codfw currently overloaded
* 19:01 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
* 07:47 marostegui: Rename aft_feedback tables on db2115, db2131 - [[phab:T250715|T250715]]
* 18:55 mutante: gitlab2002 - rebooting
* 06:41 TimStarling: on votewiki, set voter-privacy option to 1 on all prior elections [[phab:T288924|T288924]]
* 18:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17031 and previous config saved to /var/cache/conftool/dbconfig/20210816-055445-root.json
* 18:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17030 and previous config saved to /var/cache/conftool/dbconfig/20210816-055427-root.json
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29681 and previous config saved to /var/cache/conftool/dbconfig/20220613-184015-marostegui.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17029 and previous config saved to /var/cache/conftool/dbconfig/20210816-053941-root.json
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29680 and previous config saved to /var/cache/conftool/dbconfig/20220613-182510-marostegui.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17028 and previous config saved to /var/cache/conftool/dbconfig/20210816-053924-root.json
* 18:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17027 and previous config saved to /var/cache/conftool/dbconfig/20210816-052437-root.json
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P29679 and previous config saved to /var/cache/conftool/dbconfig/20220613-181005-marostegui.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17026 and previous config saved to /var/cache/conftool/dbconfig/20210816-052420-root.json
* 17:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1146.eqiad.wmnet with OS buster
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17025 and previous config saved to /var/cache/conftool/dbconfig/20210816-050934-root.json
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29678 and previous config saved to /var/cache/conftool/dbconfig/20220613-175500-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17024 and previous config saved to /var/cache/conftool/dbconfig/20210816-050916-root.json
* 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1145.eqiad.wmnet with OS buster
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17023 and previous config saved to /var/cache/conftool/dbconfig/20210816-045430-root.json
* 17:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1143.eqiad.wmnet with OS buster
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17022 and previous config saved to /var/cache/conftool/dbconfig/20210816-045413-root.json
* 17:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 04:49 marostegui: Upgrade db2088 (s1 and s2) to 10.4.21
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
* 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 (s1 and s2) to upgrade', diff saved to https://phabricator.wikimedia.org/P17021 and previous config saved to /var/cache/conftool/dbconfig/20210816-044906-marostegui.json
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
* 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1148.eqiad.wmnet with OS buster
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 17:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2002.codfw.wmnet with OS buster
* 17:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
* 17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1147.eqiad.wmnet with OS buster
* 17:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29677 and previous config saved to /var/cache/conftool/dbconfig/20220613-171438-marostegui.json
* 17:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29676 and previous config saved to /var/cache/conftool/dbconfig/20220613-171430-marostegui.json
* 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3001.esams.wmnet with OS bullseye
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
* 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1144.eqiad.wmnet with OS buster
* 17:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS buster
* 17:03 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
* 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29675 and previous config saved to /var/cache/conftool/dbconfig/20220613-165925-marostegui.json
* 16:58 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
* 16:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS buster
* 16:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1146.eqiad.wmnet with OS buster
* 16:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 16:53 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1145.eqiad.wmnet with OS buster
* 16:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3001.esams.wmnet with reason: host reimage
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 16:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
* 16:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
* 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P29674 and previous config saved to /var/cache/conftool/dbconfig/20220613-164419-marostegui.json
* 16:40 dancy@deploy1002: prep aborted:  (duration: 01m 40s)
* 16:38 dancy@deploy1002: prep aborted:  (duration: 06m 12s)
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
* 16:32 marostegui: dbmaint x2@eqiad upgrade and reboot all x2 db hosts [[phab:T310485|T310485]]
* 16:32 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti3001.esams.wmnet with OS bullseye
* 16:32 dancy@deploy1002: prep aborted:  (duration: 00m 26s)
* 16:31 marostegui: Reboot all codfw parsercache hosts [[phab:T310485|T310485]]
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29673 and previous config saved to /var/cache/conftool/dbconfig/20220613-162914-marostegui.json
* 16:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS buster
* 16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2001.codfw.wmnet with OS buster
* 16:10 robh: ganeti3001 rebooting and reimaging for firmware updates via [[phab:T308238|T308238]]
* 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 27s)
* 15:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:47 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:805173{{!}} Bumping portals to master (T128546)]] (duration: 03m 35s)
* 15:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
* 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS buster
* 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29672 and previous config saved to /var/cache/conftool/dbconfig/20220613-152900-marostegui.json
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29671 and previous config saved to /var/cache/conftool/dbconfig/20220613-152852-marostegui.json
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29670 and previous config saved to /var/cache/conftool/dbconfig/20220613-151347-marostegui.json
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 15:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
* 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P29669 and previous config saved to /var/cache/conftool/dbconfig/20220613-145842-marostegui.json
* 14:58 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29668 and previous config saved to /var/cache/conftool/dbconfig/20220613-144337-marostegui.json
* 14:42 marostegui: Failover m1 and m2 to a different proxy [[phab:T310484|T310484]]
* 14:38 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:34 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29667 and previous config saved to /var/cache/conftool/dbconfig/20220613-141802-marostegui.json
* 14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29666 and previous config saved to /var/cache/conftool/dbconfig/20220613-141754-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29665 and previous config saved to /var/cache/conftool/dbconfig/20220613-140249-marostegui.json
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:00 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 13:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dumpsdata1007.eqiad.wmnet
* 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1057.eqiad.wmnet with OS bullseye
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P29663 and previous config saved to /var/cache/conftool/dbconfig/20220613-134744-marostegui.json
* 13:45 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
* 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29662 and previous config saved to /var/cache/conftool/dbconfig/20220613-133239-marostegui.json
* 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dse-k8s-worker[1001-1004].eqiad.wmnet with reason: reboots
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on dse-k8s-worker[1001-1004].eqiad.wmnet with reason: reboots
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
* 13:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
* 13:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
* 13:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 13:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1057.eqiad.wmnet with reason: host reimage
* 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
* 13:24 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
* 13:22 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.15/extensions/GrowthExperiments/modules/ext.growthExperiments.DataStore/NewcomerTasksStore.js: {{Gerrit|67a5352b0bf9f6aa160cc93a42ca22a02aad883a}}: NewcomerTasksStore: update quality gate config when the task queue is set ([[phab:T309768|T309768]]) (duration: 03m 41s)
* 13:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1057.eqiad.wmnet with reason: host reimage
* 13:21 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
* 13:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
* 13:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
* 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: reboots
* 13:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
* 13:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: reboots
* 13:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1138 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29660 and previous config saved to /var/cache/conftool/dbconfig/20220613-130512-marostegui.json
* 13:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 13:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29659 and previous config saved to /var/cache/conftool/dbconfig/20220613-130504-marostegui.json
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
* 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
* 12:54 marostegui@cumin1001: dbctl commit