You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2014.codfw.wmnet with OS bullseye)
imported>Stashbot
(taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" | mwscript purgeList.php --wiki enwiki # T314712)
 
(144 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2022-03-11 ==
== 2022-08-07 ==
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2014.codfw.wmnet with OS bullseye
* 19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" {{!}} mwscript purgeList.php --wiki enwiki # [[phab:T314712|T314712]]
* 15:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
* 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
* 15:42 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
* 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 15:39 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 15:38 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
* 15:37 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
* 15:36 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
* 15:36 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
* 15:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
* 15:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 15:33 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 15:27 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2014.codfw.wmnet with OS bullseye
* 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
* 15:07 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host kubernetes2013.codfw.wmnet with OS bullseye
* 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22374 and previous config saved to /var/cache/conftool/dbconfig/20220311-150702-root.json
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
* 15:02 XioNoX: cr1/2-eqiad AVOID-PATHS as-path TI "6762 .*"
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json
* 15:02 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*" <- rolled back
* 14:57 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*"
* 14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22373 and previous config saved to /var/cache/conftool/dbconfig/20220311-145159-root.json
* 14:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22372 and previous config saved to /var/cache/conftool/dbconfig/20220311-143652-root.json
* 14:35 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2013.codfw.wmnet with OS bullseye
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22371 and previous config saved to /var/cache/conftool/dbconfig/20220311-142147-root.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22370 and previous config saved to /var/cache/conftool/dbconfig/20220311-140641-root.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P22369 and previous config saved to /var/cache/conftool/dbconfig/20220311-140549-marostegui.json
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22368 and previous config saved to /var/cache/conftool/dbconfig/20220311-135137-root.json
* 13:49 marostegui: dbmaint on s8@eqiad [[phab:T300775|T300775]]
* 13:49 marostegui: dbmaint on s1@eqiad [[phab:T298294|T298294]]
* 13:43 jelto: update pcc facts
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22367 and previous config saved to /var/cache/conftool/dbconfig/20220311-133633-root.json
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P22366 and previous config saved to /var/cache/conftool/dbconfig/20220311-133407-marostegui.json
* 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cumin2001.codfw.wmnet
* 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cumin2001.codfw.wmnet
* 11:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2012.codfw.wmnet with OS bullseye
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 11:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
* 11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
* 10:59 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2110.codfw.wmnet with OS bullseye
* 10:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2012.codfw.wmnet with OS bullseye
* 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2011.codfw.wmnet with OS bullseye
* 10:39 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
* 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
* 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
* 10:25 vgutierrez: disable certspotter - [[phab:T303593|T303593]]
* 10:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
* 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 10:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2110.codfw.wmnet with OS bullseye
* 10:16 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2011.codfw.wmnet with OS bullseye
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 10:03 dcausse: manually installed jvmquake to wdqs1010 (test machine) from https://people.wikimedia.org/~jmm/jvmquake/
* 09:54 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:47 vgutierrez: stopping certspotter on alert1001
* 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:36 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:35 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:15 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:00 jayme: kubernetes2011:~# systemctl restart rsyslog.service - [[phab:T289766|T289766]]
* 08:52 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 08:51 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
* 08:43 dcausse: restarting blazegraph on wdqs1012 (jvm stuck for 5hours)
* 08:42 jynus: upgrade and restart db2139
* 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
* 08:30 jynus: upgrade and restart db1145
* 08:23 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
* 08:21 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
* 08:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22364 and previous config saved to /var/cache/conftool/dbconfig/20220311-063921-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22363 and previous config saved to /var/cache/conftool/dbconfig/20220311-062417-root.json
* 06:13 marostegui: Reboot dbproxy1014 [[phab:T303174|T303174]]
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22362 and previous config saved to /var/cache/conftool/dbconfig/20220311-060913-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22361 and previous config saved to /var/cache/conftool/dbconfig/20220311-055409-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P22360 and previous config saved to /var/cache/conftool/dbconfig/20220311-054514-marostegui.json
* 02:54 eileen: revision changed from {{Gerrit|9fb68b24}} to {{Gerrit|252269c8}}
* 01:56 eileen: civicrm revision changed from {{Gerrit|8501c38c}} to {{Gerrit|9fb68b24}}
* 01:31 eileen: civicrm changed from {{Gerrit|4cb2bdbc}} to {{Gerrit|8501c38c}}
* 00:33 TimStarling: on mwmaint1002 running populateGlobalEditCount.php
* 00:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 00:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply


== 2022-03-10 ==
== 2022-08-06 ==
* 23:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
* 23:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 23:08 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 23:07 rzl@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:42 tstarling@deploy1002: Finished scap: global_edit_count gerrit 769561 (duration: 15m 12s)
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:27 tstarling@deploy1002: Started scap: global_edit_count gerrit 769561
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/User/CentralAuthUser.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/Hooks/Handlers/UserEditCountUpdateHookHandler.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:23 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthServices.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 03:02 krinkle@deploy1002: Synchronized w/: {{Gerrit|I9067d47fab0324}} (duration: 03m 25s)
* 22:22 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/ServiceWiring.php: global_edit_count gerrit 769561 (duration: 00m 48s)
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:21 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthEditCounter.php: global_edit_count gerrit 769561 (duration: 00m 48s)
* 03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 22:04 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 22:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:41 rzl: UTC late B&C training window done
* 21:39 rzl@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:769779{{!}}CommonSettings: Update comment about Image Suggestions API (T294362)]] (duration: 00m 48s)
* 21:34 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/modules/controller.js: Backport: [[gerrit:769559{{!}}Fix highlighting of comments when reloading (T303261)]] (duration: 00m 47s)
* 21:33 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw: Backport: [[gerrit:769558{{!}}Preserve classes on media wrapper links (T292657 T303469)]] (duration: 00m 49s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:18 cstone: update Donation Interface revision changed from {{Gerrit|ca37a93e}} to {{Gerrit|5db12b21}}
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:13 rzl@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:766307{{!}}Remove centralauth-oversight from the config (T302675)]] (duration: 00m 49s)
* 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22356 and previous config saved to /var/cache/conftool/dbconfig/20220310-205114-marostegui.json
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22355 and previous config saved to /var/cache/conftool/dbconfig/20220310-203608-marostegui.json
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22354 and previous config saved to /var/cache/conftool/dbconfig/20220310-202103-marostegui.json
* 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22353 and previous config saved to /var/cache/conftool/dbconfig/20220310-200558-marostegui.json
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:47 volans: installed spicerack v2.3.2 on the cumin hosts
* 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:46 volans@cumin2002: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 19:46 volans@cumin2002: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 19:44 volans: uploaded spicerack_2.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 19:33 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 19:32 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 19:32 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 19:31 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 19:29 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 19:29 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 19:06 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 19:06 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22352 and previous config saved to /var/cache/conftool/dbconfig/20220310-190544-marostegui.json
* 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22351 and previous config saved to /var/cache/conftool/dbconfig/20220310-190530-marostegui.json
* 19:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 19:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 19:02 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 19:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 19:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 18:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 18:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 18:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 18:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 18:56 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22350 and previous config saved to /var/cache/conftool/dbconfig/20220310-185025-marostegui.json
* 18:46 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 18:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 18:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 18:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:40 moritzm: restarting thumbor to pick up tiff security updates
* 18:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 18:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 18:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 18:36 moritzm: installing tiff security updates
* 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22349 and previous config saved to /var/cache/conftool/dbconfig/20220310-183520-marostegui.json
* 18:33 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 18:30 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 18:29 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 18:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 18:27 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 18:26 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22348 and previous config saved to /var/cache/conftool/dbconfig/20220310-182015-marostegui.json
* 18:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:19 razzi: cumin 'C:elasticsearch' 'systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service'
* 18:15 razzi: systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service on elastic2042 for [[phab:T300295|T300295]]
* 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:11 moritzm: installing cyrus-sasl2 security updates
* 18:08 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 17:51 herron: repool thanos-fe1001
* 17:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:43 herron: depooling thanos-fe1001 for envoy upgrade
* 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:41 dancy@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:761965{{!}}wmf-config: Use __DIR__ instead of "$IP/../wmf-config" (T45956)]] (duration: 00m 50s)
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22347 and previous config saved to /var/cache/conftool/dbconfig/20220310-172001-marostegui.json
* 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22346 and previous config saved to /var/cache/conftool/dbconfig/20220310-171953-marostegui.json
* 17:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22345 and previous config saved to /var/cache/conftool/dbconfig/20220310-170448-marostegui.json
* 16:57 damilare: civicrm change revision from {{Gerrit|9b5aafbc}} to {{Gerrit|4cb2bdbc}}
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json
* 16:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
* 16:50 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
* 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22343 and previous config saved to /var/cache/conftool/dbconfig/20220310-164943-marostegui.json
* 16:49 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on D<nowiki>{</nowiki>cumin1001.mgmt<nowiki>}</nowiki> with reason: Testing alertmanager downtime
* 16:49 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on D<nowiki>{</nowiki>cumin1001.mgmt<nowiki>}</nowiki> with reason: Testing alertmanager downtime
* 16:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Testing alertmanager downtime
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22342 and previous config saved to /var/cache/conftool/dbconfig/20220310-163509-ladsgroup.json
* 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22341 and previous config saved to /var/cache/conftool/dbconfig/20220310-163438-marostegui.json
* 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
* 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
* 16:30 sukhe: depool doh1002 for testing eBPF
* 16:21 volans: uploaded spicerack_2.3.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22340 and previous config saved to /var/cache/conftool/dbconfig/20220310-162004-ladsgroup.json
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json
* 15:57 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:56 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1121.eqiad.wmnet with OS bullseye
* 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
* 15:37 moritzm: rolling restart of thumbor to pick up expat security updates
* 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22338 and previous config saved to /var/cache/conftool/dbconfig/20220310-153428-marostegui.json
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22337 and previous config saved to /var/cache/conftool/dbconfig/20220310-153424-marostegui.json
* 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22336 and previous config saved to /var/cache/conftool/dbconfig/20220310-153416-marostegui.json
* 15:33 sukhe: upload certspotter 0.10-1wm1 to apt.wm.o - [[phab:T204993|T204993]]
* 15:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1121.eqiad.wmnet with OS bullseye
* 15:21 moritzm: installing expat security updates on stretch
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22335 and previous config saved to /var/cache/conftool/dbconfig/20220310-151923-marostegui.json
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22334 and previous config saved to /var/cache/conftool/dbconfig/20220310-151910-marostegui.json
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22331 and previous config saved to /var/cache/conftool/dbconfig/20220310-150417-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22330 and previous config saved to /var/cache/conftool/dbconfig/20220310-150405-marostegui.json
* 14:55 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:54 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22329 and previous config saved to /var/cache/conftool/dbconfig/20220310-145258-ladsgroup.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22328 and previous config saved to /var/cache/conftool/dbconfig/20220310-144911-marostegui.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22327 and previous config saved to /var/cache/conftool/dbconfig/20220310-144900-marostegui.json
* 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22326 and previous config saved to /var/cache/conftool/dbconfig/20220310-144222-marostegui.json
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22325 and previous config saved to /var/cache/conftool/dbconfig/20220310-144214-marostegui.json
* 14:41 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22324 and previous config saved to /var/cache/conftool/dbconfig/20220310-143753-ladsgroup.json
* 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22323 and previous config saved to /var/cache/conftool/dbconfig/20220310-142709-marostegui.json
* 14:26 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:25 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22321 and previous config saved to /var/cache/conftool/dbconfig/20220310-141204-marostegui.json
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:08 akosiaris: repool ores in eqiad in discovery records
* 14:06 urbanecm: UTC afternoon B&C done
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22320 and previous config saved to /var/cache/conftool/dbconfig/20220310-135659-marostegui.json
* 13:55 akosiaris: depool ores in eqiad from discovery records to initiate reboot of rdb1011
* 13:55 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
* 13:51 akosiaris: repool ores in codfw in discovery records
* 13:50 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22319 and previous config saved to /var/cache/conftool/dbconfig/20220310-135047-marostegui.json
* 13:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22318 and previous config saved to /var/cache/conftool/dbconfig/20220310-135039-marostegui.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22317 and previous config saved to /var/cache/conftool/dbconfig/20220310-134807-marostegui.json
* 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22316 and previous config saved to /var/cache/conftool/dbconfig/20220310-134759-marostegui.json
* 13:43 akosiaris: reboot rdb2007 for upgrades
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22315 and previous config saved to /var/cache/conftool/dbconfig/20220310-133534-marostegui.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22314 and previous config saved to /var/cache/conftool/dbconfig/20220310-133254-marostegui.json
* 13:27 akosiaris: depool ores in codfw from discovery records to initiate reboot of rdb2007
* 13:26 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
* 13:22 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:20 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22311 and previous config saved to /var/cache/conftool/dbconfig/20220310-132029-marostegui.json
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22310 and previous config saved to /var/cache/conftool/dbconfig/20220310-131748-marostegui.json
* 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22308 and previous config saved to /var/cache/conftool/dbconfig/20220310-130523-marostegui.json
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22307 and previous config saved to /var/cache/conftool/dbconfig/20220310-130243-marostegui.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22306 and previous config saved to /var/cache/conftool/dbconfig/20220310-125909-marostegui.json
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22305 and previous config saved to /var/cache/conftool/dbconfig/20220310-125901-marostegui.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22304 and previous config saved to /var/cache/conftool/dbconfig/20220310-125709-ladsgroup.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22303 and previous config saved to /var/cache/conftool/dbconfig/20220310-124355-marostegui.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22302 and previous config saved to /var/cache/conftool/dbconfig/20220310-124204-ladsgroup.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22301 and previous config saved to /var/cache/conftool/dbconfig/20220310-122850-marostegui.json
* 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json
* 12:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1141.eqiad.wmnet with OS bullseye
* 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
* 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22299 and previous config saved to /var/cache/conftool/dbconfig/20220310-121344-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22298 and previous config saved to /var/cache/conftool/dbconfig/20220310-120228-marostegui.json
* 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22297 and previous config saved to /var/cache/conftool/dbconfig/20220310-120221-marostegui.json
* 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
* 11:58 marostegui: Failover m1 master
* 11:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
* 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
* 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22296 and previous config saved to /var/cache/conftool/dbconfig/20220310-114715-marostegui.json
* 11:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1141.eqiad.wmnet with OS bullseye
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json
* 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22293 and previous config saved to /var/cache/conftool/dbconfig/20220310-113210-marostegui.json
* 11:29 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b681376]: (no justification provided) (duration: 00m 07s)
* 11:29 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b681376]: (no justification provided)
* 11:26 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 11:26 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 11:25 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1093.eqiad.wmnet
* 11:24 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 11:18 volans: rolled out python3-wmflib v1.1.2 to the entire fleet (buster+ only)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22292 and previous config saved to /var/cache/conftool/dbconfig/20220310-111705-marostegui.json
* 11:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1093.eqiad.wmnet
* 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22291 and previous config saved to /var/cache/conftool/dbconfig/20220310-111330-marostegui.json
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22290 and previous config saved to /var/cache/conftool/dbconfig/20220310-111320-marostegui.json
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22289 and previous config saved to /var/cache/conftool/dbconfig/20220310-111313-marostegui.json
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:10 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22287 and previous config saved to /var/cache/conftool/dbconfig/20220310-110253-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22286 and previous config saved to /var/cache/conftool/dbconfig/20220310-105807-marostegui.json
* 10:48 jbond: re-enable puppet fleet wide
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22285 and previous config saved to /var/cache/conftool/dbconfig/20220310-104748-marostegui.json
* 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 10:44 akosiaris: reboot rdb2009 for upgrades
* 10:44 jbond: disable puppet fleet wide
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22284 and previous config saved to /var/cache/conftool/dbconfig/20220310-104302-marostegui.json
* 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22283 and previous config saved to /var/cache/conftool/dbconfig/20220310-103243-marostegui.json
* 10:30 moritzm: failover ganeti master for drmrs/B13 to ganeti6004
* 10:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22282 and previous config saved to /var/cache/conftool/dbconfig/20220310-102757-marostegui.json
* 10:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22281 and previous config saved to /var/cache/conftool/dbconfig/20220310-101738-marostegui.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22280 and previous config saved to /var/cache/conftool/dbconfig/20220310-101133-marostegui.json
* 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22279 and previous config saved to /var/cache/conftool/dbconfig/20220310-101125-marostegui.json
* 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22278 and previous config saved to /var/cache/conftool/dbconfig/20220310-095620-marostegui.json
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22277 and previous config saved to /var/cache/conftool/dbconfig/20220310-094115-marostegui.json
* 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22276 and previous config saved to /var/cache/conftool/dbconfig/20220310-092742-marostegui.json
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22275 and previous config saved to /var/cache/conftool/dbconfig/20220310-092735-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22274 and previous config saved to /var/cache/conftool/dbconfig/20220310-092610-marostegui.json
* 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22273 and previous config saved to /var/cache/conftool/dbconfig/20220310-091807-marostegui.json
* 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22272 and previous config saved to /var/cache/conftool/dbconfig/20220310-091759-marostegui.json
* 09:16 moritzm: failover ganeti master for drmrs/B12 to ganeti6003
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22271 and previous config saved to /var/cache/conftool/dbconfig/20220310-091230-marostegui.json
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22270 and previous config saved to /var/cache/conftool/dbconfig/20220310-090254-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22269 and previous config saved to /var/cache/conftool/dbconfig/20220310-085724-marostegui.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22268 and previous config saved to /var/cache/conftool/dbconfig/20220310-084749-marostegui.json
* 08:43 apergos: UTC morning backport and config window completed
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22267 and previous config saved to /var/cache/conftool/dbconfig/20220310-084219-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22266 and previous config saved to /var/cache/conftool/dbconfig/20220310-084139-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22265 and previous config saved to /var/cache/conftool/dbconfig/20220310-083732-root.json
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22264 and previous config saved to /var/cache/conftool/dbconfig/20220310-083244-marostegui.json
* 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22263 and previous config saved to /var/cache/conftool/dbconfig/20220310-082737-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22262 and previous config saved to /var/cache/conftool/dbconfig/20220310-082642-marostegui.json
* 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22261 and previous config saved to /var/cache/conftool/dbconfig/20220310-082634-marostegui.json
* 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:24 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 2: [[gerrit:769656{{!}}SectionTranslation: Also add languages to target (T298237)]] (duration: 00m 49s)
* 08:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22260 and previous config saved to /var/cache/conftool/dbconfig/20220310-082234-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22259 and previous config saved to /var/cache/conftool/dbconfig/20220310-082227-root.json
* 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22258 and previous config saved to /var/cache/conftool/dbconfig/20220310-082223-marostegui.json
* 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:19 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 1: [[gerrit:769386{{!}}Enable SectionTranslation on Javanese, Tagalog, Mongolian, Telugu WPs (T298237)]] (duration: 00m 50s)
* 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) for reboot', diff saved to https://phabricator.wikimedia.org/P22256 and previous config saved to /var/cache/conftool/dbconfig/20220310-081244-marostegui.json
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22255 and previous config saved to /var/cache/conftool/dbconfig/20220310-081129-marostegui.json
* 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22254 and previous config saved to /var/cache/conftool/dbconfig/20220310-080718-marostegui.json
* 08:03 marostegui: Reboot dbproxy1017 1016 [[phab:T303174|T303174]]
* 08:00 marostegui: Reboot dbproxy1012, 1015, 1016 [[phab:T303174|T303174]]
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22253 and previous config saved to /var/cache/conftool/dbconfig/20220310-075623-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22252 and previous config saved to /var/cache/conftool/dbconfig/20220310-075213-marostegui.json
* 07:43 marostegui: Reboot dbproxy2001, 2002, 2003, 2004 [[phab:T303174|T303174]]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22251 and previous config saved to /var/cache/conftool/dbconfig/20220310-074118-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22250 and previous config saved to /var/cache/conftool/dbconfig/20220310-073708-marostegui.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22249 and previous config saved to /var/cache/conftool/dbconfig/20220310-073523-marostegui.json
* 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22248 and previous config saved to /var/cache/conftool/dbconfig/20220310-073022-marostegui.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22247 and previous config saved to /var/cache/conftool/dbconfig/20220310-072124-marostegui.json
* 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22246 and previous config saved to /var/cache/conftool/dbconfig/20220310-072019-marostegui.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22245 and previous config saved to /var/cache/conftool/dbconfig/20220310-071516-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22244 and previous config saved to /var/cache/conftool/dbconfig/20220310-070514-marostegui.json
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1132.eqiad.wmnet with OS bullseye
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22243 and previous config saved to /var/cache/conftool/dbconfig/20220310-070011-marostegui.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22242 and previous config saved to /var/cache/conftool/dbconfig/20220310-065009-marostegui.json
* 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22241 and previous config saved to /var/cache/conftool/dbconfig/20220310-064506-marostegui.json
* 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22240 and previous config saved to /var/cache/conftool/dbconfig/20220310-063858-marostegui.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22239 and previous config saved to /var/cache/conftool/dbconfig/20220310-063850-marostegui.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22238 and previous config saved to /var/cache/conftool/dbconfig/20220310-063503-marostegui.json
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1132.eqiad.wmnet with OS bullseye
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22237 and previous config saved to /var/cache/conftool/dbconfig/20220310-063017-marostegui.json
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22236 and previous config saved to /var/cache/conftool/dbconfig/20220310-062345-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22235 and previous config saved to /var/cache/conftool/dbconfig/20220310-060840-marostegui.json
* 06:07 marostegui: dbmaint on s3@eqiad [[phab:T272512|T272512]]
* 06:05 marostegui: dbmaint on s7@eqiad [[phab:T272512|T272512]]
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22234 and previous config saved to /var/cache/conftool/dbconfig/20220310-055335-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22233 and previous config saved to /var/cache/conftool/dbconfig/20220310-054701-marostegui.json
* 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui: dbmaint on s5@eqiad [[phab:T272512|T272512]]
* 05:46 marostegui: dbmaint on s4@eqiad [[phab:T272512|T272512]]
* 05:46 marostegui: dbmaint on pc3@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on pc2@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on pc1@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on s2@eqiad [[phab:T272512|T272512]]
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22232 and previous config saved to /var/cache/conftool/dbconfig/20220310-053950-marostegui.json
* 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 00:26 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s)
* 00:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics@7975c27]: (no justification provided)
* 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2022-03-09 ==
== 2022-08-05 ==
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
* 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
* 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
* 23:09 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]] (duration: 00m 49s)
* 16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 23:08 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 23:08 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 23:08 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 23:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1047.eqiad.wmnet
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
* 22:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1047.eqiad.wmnet
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
* 22:54 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
* 22:35 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:26 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
* 22:35 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS bullseye
* 22:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22229 and previous config saved to /var/cache/conftool/dbconfig/20220309-223130-marostegui.json
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1192.eqiad.wmnet with OS bullseye
* 22:15 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22228 and previous config saved to /var/cache/conftool/dbconfig/20220309-221555-marostegui.json
* 16:12 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports (duration: 02m 03s)
* 22:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22226 and previous config saved to /var/cache/conftool/dbconfig/20220309-220020-marostegui.json
* 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 21:57 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/Gadgets: [[phab:T303455|T303455]] (duration: 00m 50s)
* 16:11 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :) (duration: 06m 09s)
* 21:54 volans: uploaded python3-wmflib_1.1.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:10 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports
* 21:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 21:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:07 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 21:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22225 and previous config saved to /var/cache/conftool/dbconfig/20220309-214445-marostegui.json
* 16:05 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :)
* 21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 16:04 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine (duration: 34m 38s)
* 21:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 16:03 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:55 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
* 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:52 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bullseye
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:51 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
* 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:42 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bullseye
* 21:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 15:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 20:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 20:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 15:30 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine
* 20:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 15:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 20:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 15:25 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 15:24 jbond: upload trapperkeeper-metrics-clojure to puppet7 component
* 20:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:22 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
* 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:19 jbond: upload puppetlabs-http-client-clojur to puppet7 component
* 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:14 dancy@deploy1002: Finished scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory (duration: 04m 41s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 jbond: upload jolokia to puppet7 component
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:09 dancy@deploy1002: Started scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 jbond: upload test-chuck-clojure to puppet7 component
* 19:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:05 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:04 jbond: upload test-check-clojure to puppet7 component
* 19:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:57 jbond: upload nippy-clojure to puppet7 component
* 19:21 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24  refs [[phab:T300201|T300201]] (duration: 00m 50s)
* 14:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 19:20 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24  refs [[phab:T300201|T300201]]
* 14:43 jbond: upload fressian to puppet7 component
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:40 jbond: upload test-generative-clojure to puppet7 component
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:34 jbond: upload data-generators-clojure to puppet7 component
* 19:07 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]] (duration: 00m 49s)
* 14:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 14:23 jbond: upload encore-clojure to puppet7 component
* 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22222 and previous config saved to /var/cache/conftool/dbconfig/20220309-182355-marostegui.json
* 14:17 jbond: upload truss-clojure to puppet7 component
* 18:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 14:13 jbond: upload structured-logging-clojure to puppet7 component
* 18:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 14:06 jbond: upload murphy-clojure to puppet7 component
* 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22221 and previous config saved to /var/cache/conftool/dbconfig/20220309-182316-marostegui.json
* 13:57 jbond: upload logstash-logback-encoder-7.2 to puppet7 component
* 18:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22220 and previous config saved to /var/cache/conftool/dbconfig/20220309-180741-marostegui.json
* 13:49 jbond: upload kitchensink-clojure to puppet7 component
* 17:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22219 and previous config saved to /var/cache/conftool/dbconfig/20220309-175205-marostegui.json
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts with fragile power supply ([[phab:T314559|T314559]] [[phab:T314628|T314628]])', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 17:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 sukhe: repool codfw
* 17:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 13:02 jbond: upload honeysql-clojure to puppet7 component
* 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:53 _joe_: progressive repool of services in codfw
* 17:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22217 and previous config saved to /var/cache/conftool/dbconfig/20220309-173630-marostegui.json
* 12:24 moritzm: installing nano bugfix updates from bullseye point release
* 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 17:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on D3 ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32291 and previous config saved to /var/cache/conftool/dbconfig/20220805-113729-ladsgroup.json
* 17:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C6 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32290 and previous config saved to /var/cache/conftool/dbconfig/20220805-113555-ladsgroup.json
* 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C5 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32289 and previous config saved to /var/cache/conftool/dbconfig/20220805-113436-ladsgroup.json
* 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 17:29 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WebAuthn/: [[phab:T303404|T303404]] (duration: 00m 53s)
* 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 17:28 reedy@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/WebAuthn/: [[phab:T303404|T303404]] (duration: 00m 51s)
* 10:12 Amir1: dbmaint at s4@codfw ([[phab:T312863|T312863]])
* 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2008.codfw.wmnet with OS bullseye
* 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 17:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
* 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 17:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 16:56 akosiaris: reboot rdb[2008,2010].codfw.wmnet,rdb[1010,1012].eqiad.wmnet for upgrades
* 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:49 akosiaris: reboot rdb2008 for upgrades
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:45 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2008.codfw.wmnet with OS bullseye
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:22 moritzm: installing 5.10.103 kernels on bullseye hosts
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host karapace1001.eqiad.wmnet
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gerrit2002.wikimedia.org
* 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.remove-downtime for gerrit2002.wikimedia.org
* 15:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/includes/parser/Sanitizer.php: {{Gerrit|31189c6aa4dc880a9eebe6824dbc031e9109384f}}: Ensure that the recognizedTagData static cache is properly initialized ([[phab:T303360|T303360]]) (duration: 00m 51s)
* 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 15:56 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 15:56 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host karapace1001.eqiad.wmnet
* 00:18 mutante: restarting gerrit for config change - removing old replica [[phab:T313250|T313250]]
* 15:33 jbond: deploy gerrit:740818 to add more genral rate limits for crawling cached and upload pages
* 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2007.codfw.wmnet with OS bullseye
* 15:28 volans: uploaded spicerack_2.3.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 15:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
* 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:06 taavi: UTC afternoon deploys done
* 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:06 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: [[gerrit:769297{{!}}Fix missing padding on inline descriptions (T303386)]] (duration: 00m 49s)
* 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 6 hosts with reason: Maintenance
* 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 6 hosts with reason: Maintenance
* 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 15:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22215 and previous config saved to /var/cache/conftool/dbconfig/20220309-150523-marostegui.json
* 15:03 awight@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: [[gerrit:769296{{!}}Fix missing padding on inline descriptions (T303386)]] (duration: 00m 49s)
* 15:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2007.codfw.wmnet with OS bullseye
* 15:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:58 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/extension.json: Backport: [[gerrit:769433{{!}}wmf.24 HACK: Add forward class alias for Gadget (T303391)]] (2/2) (duration: 00m 49s)
* 14:57 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/includes: Backport: [[gerrit:769433{{!}}wmf.24 HACK: Add forward class alias for Gadget (T303391)]] (1/2) (duration: 00m 50s)
* 14:55 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
* 14:54 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
* 14:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22213 and previous config saved to /var/cache/conftool/dbconfig/20220309-144948-marostegui.json
* 14:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22212 and previous config saved to /var/cache/conftool/dbconfig/20220309-143413-marostegui.json
* 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:27 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:766882{{!}}Add IPInfo viewing rights for certain groups (T296499)]] (no-op on prod) (duration: 00m 50s)
* 14:18 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22211 and previous config saved to /var/cache/conftool/dbconfig/20220309-141837-marostegui.json
* 14:13 damilare: civicrm revision changed from {{Gerrit|cb0605ed}} to {{Gerrit|9b5aafbc}}
* 14:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22210 and previous config saved to /var/cache/conftool/dbconfig/20220309-140158-marostegui.json
* 14:01 marostegui: Failover m5 from db1132 to db1107 - [[phab:T302190|T302190]]
* 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 13:59 btullis: restarting pybal on lvs1019 [[phab:T301458|T301458]]
* 13:51 btullis: restarting pybal on lvs102 [[phab:T301458|T301458]]
* 13:47 marostegui: dbmaint on s8@eqiad [[phab:T272512|T272512]]
* 13:46 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22209 and previous config saved to /var/cache/conftool/dbconfig/20220309-134631-marostegui.json
* 13:45 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 13:45 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 13:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22208 and previous config saved to /var/cache/conftool/dbconfig/20220309-134552-marostegui.json
* 13:42 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:42 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22207 and previous config saved to /var/cache/conftool/dbconfig/20220309-134235-marostegui.json
* 13:30 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22206 and previous config saved to /var/cache/conftool/dbconfig/20220309-133017-marostegui.json
* 13:27 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22205 and previous config saved to /var/cache/conftool/dbconfig/20220309-132700-marostegui.json
* 13:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22204 and previous config saved to /var/cache/conftool/dbconfig/20220309-131442-marostegui.json
* 13:11 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22203 and previous config saved to /var/cache/conftool/dbconfig/20220309-131124-marostegui.json
* 12:59 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22202 and previous config saved to /var/cache/conftool/dbconfig/20220309-125907-marostegui.json
* 12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
* 12:56 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
* 12:55 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22201 and previous config saved to /var/cache/conftool/dbconfig/20220309-125549-marostegui.json
* 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1001.eqiad.wmnet with OS bullseye
* 12:26 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
* 12:25 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22200 and previous config saved to /var/cache/conftool/dbconfig/20220309-122536-marostegui.json
* 12:25 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:24 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 12:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 12:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22199 and previous config saved to /var/cache/conftool/dbconfig/20220309-120554-marostegui.json
* 11:50 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22198 and previous config saved to /var/cache/conftool/dbconfig/20220309-115019-marostegui.json
* 11:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:43 awight: sketchy EU deployment complete.
* 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:42 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:767498{{!}}Syntax highlighting color scheme update on all wikis except enwiki (T280024)]] (duration: 00m 50s)
* 11:41 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:37 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:767499{{!}}Bracket matching on all wikis except enwiki (T280023)]] (duration: 00m 49s)
* 11:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22197 and previous config saved to /var/cache/conftool/dbconfig/20220309-113442-marostegui.json
* 11:32 awight@deploy1002: Synchronized wmf-config/: Config: [[gerrit:767512{{!}}VE template expanded sidebar and inline descriptions on all wikis except enwiki (T286991)]] (duration: 00m 51s)
* 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
* 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:19 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22195 and previous config saved to /var/cache/conftool/dbconfig/20220309-111907-marostegui.json
* 11:17 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:767508{{!}}VE template back and delete button on all wikis except enwiki (T286990)]] (duration: 00m 50s)
* 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1001.eqiad.wmnet with OS bullseye
* 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:11 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:767510{{!}}Template search improvements to all wikis except enwiki (T286990)]] (duration: 00m 51s)
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1016.eqiad.wmnet
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:51 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
* 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
* 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 10:39 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 10:32 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22194 and previous config saved to /var/cache/conftool/dbconfig/20220309-103226-marostegui.json
* 10:31 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 10:31 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 10:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22193 and previous config saved to /var/cache/conftool/dbconfig/20220309-103146-marostegui.json
* 10:29 marostegui: dbmaint on s6@eqiad [[phab:T272512|T272512]]
* 10:29 marostegui: dbmaint on s3@eqiad [[phab:T298295|T298295]]
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 10:16 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22192 and previous config saved to /var/cache/conftool/dbconfig/20220309-101610-marostegui.json
* 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 10:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:769400{{!}}reenable DPL on nowikimedia]] (duration: 00m 51s)
* 10:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22191 and previous config saved to /var/cache/conftool/dbconfig/20220309-100036-marostegui.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2147', diff saved to https://phabricator.wikimedia.org/P22190 and previous config saved to /var/cache/conftool/dbconfig/20220309-094704-marostegui.json
* 09:45 marostegui: dbmaint on s7@eqiad [[phab:T298295|T298295]]
* 09:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22189 and previous config saved to /var/cache/conftool/dbconfig/20220309-094501-marostegui.json
* 09:31 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22188 and previous config saved to /var/cache/conftool/dbconfig/20220309-093119-marostegui.json
* 09:30 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:30 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22187 and previous config saved to /var/cache/conftool/dbconfig/20220309-092731-marostegui.json
* 09:26 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:26 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:23 marostegui: dbmaint on s2@eqiad [[phab:T298295|T298295]]
* 09:18 marostegui: dbmaint on s1@eqiad [[phab:T298295|T298295]]
* 09:16 marostegui: dbmaint on s4@eqiad [[phab:T298295|T298295]]
* 09:07 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 09:07 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 09:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22186 and previous config saved to /var/cache/conftool/dbconfig/20220309-090737-marostegui.json
* 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 08:53 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
* 08:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22184 and previous config saved to /var/cache/conftool/dbconfig/20220309-085201-marostegui.json
* 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 08:49 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
* 08:46 XioNoX: Redirect one of Microsoft's range to codfw - [[phab:T282861|T282861]]
* 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
* 08:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 08:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22183 and previous config saved to /var/cache/conftool/dbconfig/20220309-083626-marostegui.json
* 08:21 marostegui: dbmaint on s3@eqiad [[phab:T300380|T300380]]
* 08:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22182 and previous config saved to /var/cache/conftool/dbconfig/20220309-082051-marostegui.json
* 08:11 marostegui: dbmaint on s7@eqiad [[phab:T300380|T300380]]
* 08:03 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22181 and previous config saved to /var/cache/conftool/dbconfig/20220309-080307-marostegui.json
* 08:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 08:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 40%: After schema change', diff saved to https://phabricator.wikimedia.org/P22180 and previous config saved to /var/cache/conftool/dbconfig/20220309-075704-root.json
* 07:55 marostegui: dbmaint on s2@eqiad [[phab:T300380|T300380]]
* 07:49 marostegui: dbmaint on s8@eqiad [[phab:T300380|T300380]]
* 07:49 marostegui: dbmaint on s4@eqiad [[phab:T300380|T300380]]
* 07:42 marostegui: dbmaint on s1@eqiad [[phab:T300380|T300380]]
* 07:42 marostegui: dbmaint on s6@eqiad [[phab:T300380|T300380]]
* 07:42 marostegui: dbmaint on s5@eqiad [[phab:T300380|T300380]]
* 07:42 marostegui: dbmaint on s5 [[phab:T300380|T300380]]
* 07:42 marostegui: dbmaint on s6 [[phab:T300380|T300380]]
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22179 and previous config saved to /var/cache/conftool/dbconfig/20220309-074200-root.json
* 07:41 marostegui: dbmaint on s1 [[phab:T300380|T300380]]
* 07:41 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22178 and previous config saved to /var/cache/conftool/dbconfig/20220309-074107-root.json
* 07:34 marostegui: dbmaint on s7@eqiad [[phab:T300775|T300775]]
* 07:33 marostegui: dbmaint on db1123 s3@eqiad [[phab:T300600|T300600]]
* 07:31 elukey: manually sync pcc facts following https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 15%: After schema change', diff saved to https://phabricator.wikimedia.org/P22177 and previous config saved to /var/cache/conftool/dbconfig/20220309-072656-root.json
* 07:25 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22176 and previous config saved to /var/cache/conftool/dbconfig/20220309-072540-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P22175 and previous config saved to /var/cache/conftool/dbconfig/20220309-071153-root.json
* 07:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22174 and previous config saved to /var/cache/conftool/dbconfig/20220309-071014-root.json
* 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1123.eqiad.wmnet with OS bullseye
* 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
* 06:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22173 and previous config saved to /var/cache/conftool/dbconfig/20220309-065447-root.json
* 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
* 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1123.eqiad.wmnet with OS bullseye
* 06:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22172 and previous config saved to /var/cache/conftool/dbconfig/20220309-062010-marostegui.json
* 06:19 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 06:19 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 06:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 01:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22171 and previous config saved to /var/cache/conftool/dbconfig/20220309-014831-marostegui.json
* 01:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22170 and previous config saved to /var/cache/conftool/dbconfig/20220309-013256-marostegui.json
* 01:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22169 and previous config saved to /var/cache/conftool/dbconfig/20220309-011721-marostegui.json
* 01:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22168 and previous config saved to /var/cache/conftool/dbconfig/20220309-010146-marostegui.json
* 00:53 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22167 and previous config saved to /var/cache/conftool/dbconfig/20220309-005325-marostegui.json
* 00:52 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 00:52 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 00:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22166 and previous config saved to /var/cache/conftool/dbconfig/20220309-005245-marostegui.json
* 00:37 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22165 and previous config saved to /var/cache/conftool/dbconfig/20220309-003710-marostegui.json
* 00:21 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22164 and previous config saved to /var/cache/conftool/dbconfig/20220309-002135-marostegui.json
* 00:06 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22163 and previous config saved to /var/cache/conftool/dbconfig/20220309-000600-marostegui.json
* 00:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22162 and previous config saved to /var/cache/conftool/dbconfig/20220309-000250-marostegui.json
* 00:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 00:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 00:00 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 00:00 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 00:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22161 and previous config saved to /var/cache/conftool/dbconfig/20220309-000025-marostegui.json


== 2022-03-08 ==
== 2022-08-04 ==
* 23:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22160 and previous config saved to /var/cache/conftool/dbconfig/20220308-234450-marostegui.json
* 23:07 mutante: switching gerrit-replica.wikimedia.org to new machine gerrit2002, dropping gerrit-replica-new.wikimedia.org [[phab:T313250|T313250]]
* 23:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22159 and previous config saved to /var/cache/conftool/dbconfig/20220308-232915-marostegui.json
* 21:07 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:13 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22158 and previous config saved to /var/cache/conftool/dbconfig/20220308-231340-marostegui.json
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:10 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22157 and previous config saved to /var/cache/conftool/dbconfig/20220308-231028-marostegui.json
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:09 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:09 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:56 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark (duration: 06m 12s)
* 23:09 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22156 and previous config saved to /var/cache/conftool/dbconfig/20220308-230949-marostegui.json
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:51 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:50 thcipriani@deploy1002: Started scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark
* 22:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22155 and previous config saved to /var/cache/conftool/dbconfig/20220308-225413-marostegui.json
* 20:48 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:812391]] [config]: Add click event logging for mobile and desktop (duration: 39m 16s)
* 22:38 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22153 and previous config saved to /var/cache/conftool/dbconfig/20220308-223838-marostegui.json
* 20:45 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:23 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 ryankemper@deploy1002: helmfile [staging] START helmfile.d/
* 22:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:24 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 22:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22152 and previous config saved to /var/cache/conftool/dbconfig/20220308-222303-marostegui.json
* 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22151 and previous config saved to /var/cache/conftool/dbconfig/20220308-222055-marostegui.json
* 22:20 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 22:20 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet


== 2022-03-04 ==
== 2022-08-03 ==
* 17:59 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
* 17:57 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32270 and previous config saved to /var/cache/conftool/dbconfig/20220803-235030-marostegui.json
* 17:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32269 and previous config saved to /var/cache/conftool/dbconfig/20220803-225015-marostegui.json
* 17:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 22:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 17:46 mforns@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 07s)
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 17:46 mforns@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
* 17:39 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@19520c1]: (no justification provided) (duration: 00m 08s)
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
* 17:39 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@19520c1]: (no justification provided)
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 17:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 08s)
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 17:09 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
* 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:35 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32268 and previous config saved to /var/cache/conftool/dbconfig/20220803-224827-marostegui.json
* 16:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 10s)
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32267 and previous config saved to /var/cache/conftool/dbconfig/20220803-223321-marostegui.json
* 16:13 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32266 and previous config saved to /var/cache/conftool/dbconfig/20220803-221815-marostegui.json
* 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32265 and previous config saved to /var/cache/conftool/dbconfig/20220803-220309-marostegui.json
* 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32264 and previous config saved to /var/cache/conftool/dbconfig/20220803-220057-marostegui.json
* 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21856 and previous config saved to /var/cache/conftool/dbconfig/20220304-160629-ladsgroup.json
* 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 16:03 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 03s)
* 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 16:03 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32263 and previous config saved to /var/cache/conftool/dbconfig/20220803-220007-marostegui.json
* 15:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS buster
* 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32262 and previous config saved to /var/cache/conftool/dbconfig/20220803-214501-marostegui.json
* 15:58 vgutierrez: pool cp1086 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 21:44 damilare: payments-wiki updated from {{Gerrit|e1b6036a}} to {{Gerrit|712df4ce}}
* 15:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS buster
* 21:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - [[phab:T314078|T314078]]
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21854 and previous config saved to /var/cache/conftool/dbconfig/20220304-155124-ladsgroup.json
* 21:35 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 15:51 vgutierrez: pool cp2038 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 21:35 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 15:49 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
* 21:30 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 15:49 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 21:30 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 15:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
* 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32261 and previous config saved to /var/cache/conftool/dbconfig/20220803-212955-marostegui.json
* 15:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
* 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32260 and previous config saved to /var/cache/conftool/dbconfig/20220803-211449-marostegui.json
* 15:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32259 and previous config saved to /var/cache/conftool/dbconfig/20220803-211237-marostegui.json
* 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21852 and previous config saved to /var/cache/conftool/dbconfig/20220304-153619-ladsgroup.json
* 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:34 XioNoX: blackhole IPs - [[phab:T303055|T303055]]
* 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32258 and previous config saved to /var/cache/conftool/dbconfig/20220803-211216-marostegui.json
* 15:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS buster
* 21:03 ejegg: updated standalone SmashPig deployment from {{Gerrit|8e8f0017}} to {{Gerrit|9b97ea15}}
* 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21851 and previous config saved to /var/cache/conftool/dbconfig/20220304-152114-ladsgroup.json
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21850 and previous config saved to /var/cache/conftool/dbconfig/20220304-152007-ladsgroup.json
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
* 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32257 and previous config saved to /var/cache/conftool/dbconfig/20220803-205710-marostegui.json
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 20:55 ebernhardson@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: [[gerrit:820223{{!}}cirrus: Set ElasticaWrite partition count for cloudelastic to 3]] (duration: 03m 29s)
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21849 and previous config saved to /var/cache/conftool/dbconfig/20220304-151937-ladsgroup.json
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS buster
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21848 and previous config saved to /var/cache/conftool/dbconfig/20220304-150433-ladsgroup.json
* 14:59 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad.service on elastic1049 to resolve CirrusSearchJVMGCOldPoolFlatlined alert
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21847 and previous config saved to /var/cache/conftool/dbconfig/20220304-144926-ladsgroup.json
* 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS buster
* 14:43 vgutierrez: pool cp3059 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21846 and previous config saved to /var/cache/conftool/dbconfig/20220304-143421-ladsgroup.json
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21845 and previous config saved to /var/cache/conftool/dbconfig/20220304-143214-ladsgroup.json
* 14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21844 and previous config saved to /var/cache/conftool/dbconfig/20220304-143206-ladsgroup.json
* 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21842 and previous config saved to /var/cache/conftool/dbconfig/20220304-141701-ladsgroup.json
* 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21841 and previous config saved to /var/cache/conftool/dbconfig/20220304-140156-ladsgroup.json
* 13:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1302-1306].eqiad.wmnet
* 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21840 and previous config saved to /var/cache/conftool/dbconfig/20220304-134651-ladsgroup.json
* 13:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21839 and previous config saved to /var/cache/conftool/dbconfig/20220304-134443-ladsgroup.json
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21838 and previous config saved to /var/cache/conftool/dbconfig/20220304-134436-ladsgroup.json
* 13:38 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21837 and previous config saved to /var/cache/conftool/dbconfig/20220304-132931-ladsgroup.json
* 13:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1302-1306].eqiad.wmnet
* 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21836 and previous config saved to /var/cache/conftool/dbconfig/20220304-131426-ladsgroup.json
* 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21835 and previous config saved to /var/cache/conftool/dbconfig/20220304-125921-ladsgroup.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21834 and previous config saved to /var/cache/conftool/dbconfig/20220304-125714-ladsgroup.json
* 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21833 and previous config saved to /var/cache/conftool/dbconfig/20220304-125706-ladsgroup.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21832 and previous config saved to /var/cache/conftool/dbconfig/20220304-124201-ladsgroup.json
* 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21831 and previous config saved to /var/cache/conftool/dbconfig/20220304-122656-ladsgroup.json
* 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21830 and previous config saved to /var/cache/conftool/dbconfig/20220304-121152-ladsgroup.json
* 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21829 and previous config saved to /var/cache/conftool/dbconfig/20220304-120944-ladsgroup.json
* 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21828 and previous config saved to /var/cache/conftool/dbconfig/20220304-120937-ladsgroup.json
* 12:04 jbond: enable SameSite=Strict on idp
* 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21827 and previous config saved to /var/cache/conftool/dbconfig/20220304-115432-ladsgroup.json
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21826 and previous config saved to /var/cache/conftool/dbconfig/20220304-113927-ladsgroup.json
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21825 and previous config saved to /var/cache/conftool/dbconfig/20220304-112422-ladsgroup.json
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21824 and previous config saved to /var/cache/conftool/dbconfig/20220304-112214-ladsgroup.json
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21823 and previous config saved to /var/cache/conftool/dbconfig/20220304-112207-ladsgroup.json
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
* 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4024.ulsfo.wmnet with OS buster
* 11:09 vgutierrez: pool cp4024 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21822 and previous config saved to /var/cache/conftool/dbconfig/20220304-110702-ladsgroup.json
* 10:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
* 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
* 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21821 and previous config saved to /var/cache/conftool/dbconfig/20220304-105157-ladsgroup.json
* 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS buster
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4024.ulsfo.wmnet with OS buster
* 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21820 and previous config saved to /var/cache/conftool/dbconfig/20220304-103652-ladsgroup.json
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21819 and previous config saved to /var/cache/conftool/dbconfig/20220304-103444-ladsgroup.json
* 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21818 and previous config saved to /var/cache/conftool/dbconfig/20220304-103437-ladsgroup.json
* 10:29 vgutierrez: pool cp5004 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5004.eqsin.wmnet with OS buster
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21817 and previous config saved to /var/cache/conftool/dbconfig/20220304-101932-ladsgroup.json
* 10:08 aqu@deploy1002: Finished deploy [airflow-dags/analytics@1c8384f]: AF //tion default args (duration: 00m 07s)
* 10:08 aqu@deploy1002: Started deploy [airflow-dags/analytics@1c8384f]: AF //tion default args
* 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21816 and previous config saved to /var/cache/conftool/dbconfig/20220304-100427-ladsgroup.json
* 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300992|T300992]])', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20220304-094918-ladsgroup.json
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21815 and previous config saved to /var/cache/conftool/dbconfig/20220304-094710-ladsgroup.json
* 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21814 and previous config saved to /var/cache/conftool/dbconfig/20220304-094702-ladsgroup.json
* 09:43 vgutierrez: restart varnish on cp3056
* 09:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
* 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
* 09:37 vgutierrez: restart varnish on cp3058
* 09:33 vgutierrez: restart varnish on cp3060
* 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21813 and previous config saved to /var/cache/conftool/dbconfig/20220304-093157-ladsgroup.json
* 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21812 and previous config saved to /var/cache/conftool/dbconfig/20220304-091652-ladsgroup.json
* 09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5004.eqsin.wmnet with OS buster
* 09:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1005-1006].eqiad.wmnet
* 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21811 and previous config saved to /var/cache/conftool/dbconfig/20220304-090147-ladsgroup.json
* 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21810 and previous config saved to /var/cache/conftool/dbconfig/20220304-085939-ladsgroup.json
* 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21809 and previous config saved to /var/cache/conftool/dbconfig/20220304-085932-ladsgroup.json
* 08:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21808 and previous config saved to /var/cache/conftool/dbconfig/20220304-084427-ladsgroup.json
* 08:34 akosiaris: [[phab:T303027|T303027]] depool mw130[2-6]. Old jobrunners/videoscalers, being decommisioned
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=mw130[2-6].eqiad.wmnet
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21807 and previous config saved to /var/cache/conftool/dbconfig/20220304-082922-ladsgroup.json
* 08:23 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1005-1006].eqiad.wmnet
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21806 and previous config saved to /var/cache/conftool/dbconfig/20220304-081417-ladsgroup.json
* 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21805 and previous config saved to /var/cache/conftool/dbconfig/20220304-081210-ladsgroup.json
* 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:27 XioNoX: push pfw policies - [[phab:T303003|T303003]]
* 01:35 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
* 01:34 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
* 01:34 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 01:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 01:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
* 01:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
* 01:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 01:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 01:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 01:29 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 01:29 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 01:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 01:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 01:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 01:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 01:23 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 01:23 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
* 01:22 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
 
== 2022-03-03 ==
* 21:35 brennen: end of UTC late backport & config window / training
* 21:30 brennen@deploy1002: Finished scap: Config: [[gerrit:766229{{!}}Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)]] (duration: 01m 33s)
* 21:28 brennen@deploy1002: Started scap: Config: [[gerrit:766229{{!}}Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)]]
* 21:28 brennen@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:766229{{!}}Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)]] (duration: 00m 48s)
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/includes/VisualEditorParsoidClient.php: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]) (duration: 03m 25s)
* 20:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32256 and previous config saved to /var/cache/conftool/dbconfig/20220803-204204-marostegui.json
* 20:39 urbanecm@deploy1002: sync-file aborted: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]ú (duration: 00m 00s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/: {{Gerrit|b840eef86837aed3e566885110e93b2ca9ab5f42}}: Fix ReplyLinksController#teardown (duration: 03m 27s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/: {{Gerrit|70a18f5846111a0dfe8ba473daf384cbb8e88804}}:  Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 13s)
* 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/CirrusSearch/: {{Gerrit|9961e9bc8f5873f8ddc8a11108de0a7bfcb14ae6}}: Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 23s)
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 cwhite@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host logstash2032.codfw.wmnet
* 19:35 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:32 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): no current blockers; proceeding to all wikis
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: Backport: [[gerrit:767812{{!}}Unset data-toc in SkinVector (T302461)]] (duration: 00m 49s)
* 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32255 and previous config saved to /var/cache/conftool/dbconfig/20220803-202658-marostegui.json
 
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32254 and previous config saved to /var/cache/conftool/dbconfig/20220803-202146-marostegui.json
* 20:21 marostegui@cumin1001:


== 2022-03-02 ==
== 2022-08-02 ==
* 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:25 ryankemper: [[phab:T276198|T276198]] Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from [[phab:T276198|T276198]]"'`
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site  /home) again after gerrit2002 was reimaged with buster [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 23:21 ryankemper: [[phab:T276198|T276198]] https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
* 22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
* 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 23:08 robh
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START


== 2022-03-01 ==
== 2022-08-01 ==
* 22:51 inflatador: [[phab:T276198|T276198]] reenabled puppet on elastic1052.eqiad.wmnet
* 23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Id1ce285631f5}}, {{Gerrit|I194d419fbfe}} (duration: 03m 09s)
* 22:37 inflatador: [[phab:T276198|T276198]] rebooting elastic1052.eqiad.wmnet to test failure condition
* 23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:32 inflatador: [[phab:T276198|T276198]] disabling puppet on elastic1052.eqiad.wmnet to test failure condition (rebooting shortly)
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:53 dancy@deploy1002: Finished scap: Resync to try to clear alerts (duration: 12m 08s)
* 21:08 moritzm: drain ganeti2028 [[phab:T309957|T309957]]
* 21:41 dancy@deploy1002: Started scap: Resync to try to clear alerts
* 21:03 mutante: gerrit2002 - mkdir /var/lib/gerrit2/review_site {{!}} gerrit1001 - rsyncing /var/lib/gerrit2/review_site/ to gerrit2002 [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 21:36 dancy@deploy1002: Started scap: Resync to try to clear alerts
* 21:01 urbanecm: UTC late backport window done
* 20:36 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|461e0709a8987b110f669b74afc38c706b616e5d}}: itwiki: Change robot policy on NS2 and NS3 ([[phab:T314165|T314165]]) (duration: 03m 18s)
* 20:33 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): no current blockers; proceeding to group0; note this may briefly trigger some version alerts
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/includes: Backport: [[gerrit:767089{{!}}Revert "preferences: Use a faster and simpler form descriptor when validating" (T302643)]] (duration: 00m 55s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:05 mutante: alert1001 - re-enabled puppet
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.24 refs [[phab:T300200|T300200]] (duration: 53m 17s)
* 20:57 mutante: phab1001 - rsyncing repo data /srv/repos/ to phab2002 (in addition to phab1004 previously) [[phab:T313360|T313360]]
* 19:45 mutante: alert1001 - disable puppet, systemctl stop ircecho - to stop bot spam, caused somehow by new scap version breaking "mw versions mismwatch" alerting - affects labtestwiki,testwiki,testwikidatawiki
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:38 mutante: mw1449 - scap pull
* 20:55 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mnwwiktionary --fix # [[phab:T314023|T314023]]
* 19:36 mutante: mw1414 - scap pull
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba8c17759b7e737a6757792ad4136ff3af00030c}}: mnwwiktionary: Create Appendix namespace ([[phab:T314023|T314023]]) (duration: 03m 09s)
* 19:11 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2008.codfw.wmnet
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:57 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): there's currently a single blocker at [[phab:T302643|T302643]]; staging to testwikis and holding there until backport's available
* 20:48 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateArticleCount.php --wiki=viwikibooks --update # [[phab:T314239|T314239]]
* 18:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2008.codfw.wmnet
* 20:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c19c3e36ab}}: DiscussionTools: Make new reply buttons available at mediawiki.org ([[phab:T314076|T314076]]); {{Gerrit|24db016c4}}: viwikibooks: Change wgArticleCountMethod to any ([[phab:T314239|T314239]]) (duration: 03m 10s)
* 18:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
* 20:35 daniel@deploy1002: Synchronized php-1.39.0-wmf.22/includes/Rest/Handler: Fix: [[gerrit:819129{{!}}Parsoid REST handler: allow pagebundle input without original HTML.]] (duration: 03m 15s)
* 18:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
* 20:25 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ne.svg ([[phab:T311700|T311700]])
* 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21626 and previous config saved to /var/cache/conftool/dbconfig/20220301-180216-ladsgroup.json
* 20:21 daniel@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ne.svg: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 17s)
* 17:52 cwhite: completed grafana upgrade in eqiad [[phab:T282863|T282863]]
* 20:17 daniel@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 32s)
* 17:50 herron: re-enabling puppet and ircecho on alert1001
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:47 cwhite: upgrade grafana in eqiad [[phab:T282863|T282863]]
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21625 and previous config saved to /var/cache/conftool/dbconfig/20220301-174711-ladsgroup.json
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21624 and previous config saved to /var/cache/conftool/dbconfig/20220301-173206-ladsgroup.json
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:24 dancy@deploy1002: Finished scap: testing container image build (duration: 28m 39s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:17 herron: stopped ircecho on alert1001 due to systemd unit alert shower
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21622 and previous config saved to /var/cache/conftool/dbconfig/20220301-171701-ladsgroup.json
* 20:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2054.codfw.wmnet with OS bullseye
* 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21621 and previous config saved to /var/cache/conftool/dbconfig/20220301-171441-ladsgroup.json
* 19:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 17:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 19:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2054.codfw.wmnet with OS bullseye
* 16:55 dancy@deploy1002: Started scap: testing container image build
* 18:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2031.codfw.wmnet with OS bullseye
* 16:24 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 03s)
* 18:44 mutante: gitlab - moved data_persistence group to new parent, under /repos/
* 16:23 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
* 18:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 16:12 moritzm: restarting apache on logstash nodes to pick up expat update
* 18:32 mutante: gitlab - created group 'data_persistence' - added Ladsgroup and upgraded from member to maintainer
* 16:11 elukey@deploy1002: Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - [[phab:T300195|T300195]] (duration: 36m 13s)
* 18:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 16:05 moritzm: restarting nginx on wcqs* nodes to pick up expat update
* 18:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2031.codfw.wmnet with OS bullseye
* 15:35 elukey@deploy1002: Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - [[phab:T300195|T300195]]
* 17:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2025.codfw.wmnet with OS bullseye
* 15:21 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 07s)
* 17:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 15:21 ntsako@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
* 17:31 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 15:06 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2003.codfw.wmnet
* 17:18 ryankemper: [[phab:T289135|T289135]] [[phab:T314078|T314078]] Manually reimaging remaining codfw stretch hosts (`elastic[2025,2031,2054,2059-2060]`) to bullseye, one host at a time, waiting for green cluster status to return between each run. `ryankemper@cumin1001` tmux session `codfw_reimage`
* 14:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2025.codfw.wmnet with OS bullseye
* 14:52 elukey: elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node)
* 17:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 17:08 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:51 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 17:06 mutante: alert1001 - systemctl restart nsca - pinged by fundraising tech because fundraising hosts have the "passive check is awol" issue again ([[phab:T196336|T196336]])
* 14:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2002.codfw.wmnet
* 16:25 moritzm: installing tcpdump updates from bullseye point release
* 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:23 cwhite@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 14:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1018.eqiad.wmnet with OS bullseye
* 14:38 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:10 cwhite@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 14:36 vgutierrez: pool cp1087 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
* 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 14:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1018.eqiad.wmnet with OS bullseye
* 14:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 15:39 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 14:32 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
* 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:28 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2001.codfw.wmnet
* 15:29 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 14:19 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (2/2, should be a no-op) (duration: 03m 30s)
* 14:19 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (1/2, should be a no-op) (duration: 03m 15s)
* 14:14 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 moritzm: restarting nginx on wdqs* nodes to pick up expat update
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:03 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 13:57 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:53 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:42 moritzm: installing openjdk-11 security updates
* 13:53 mmandere: restart purged on cp60[15-16]
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 13:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 13:48 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:38 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 13:48 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 14:34 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 13:48 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:48 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:44 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:43 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:43 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:40 kormat: Deploying wmfmariadbpy 0.9 [[phab:T302796|T302796]]
* 14:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 13:40 kormat: uploaded wmfmariadbpy 0.9 to apt.wm.o [[phab:T302796|T302796]]
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:28 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 13:39 klausman@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/skins/Vector/: {{Gerrit|b5007c5f1c389deb344c5bb99e950b4190436cab}}: Revert "styles: Unify on standard external link icon"" (duration: 03m 16s)
* 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 14:12 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 13:32 moritzm: restarting nginx on registry* nodes to pick up expat update
* 14:12 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 13:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 XioNoX: restart cr1-drmrs for software upgrade
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 13:03 moritzm: restarting FPM/Apache on parsoid hosts to pick up expat update
* 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2044.codfw.wmnet with OS bullseye
* 12:50 vgutierrez: pool cp3062 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 2/2) (duration: 03m 17s)
* 12:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
* 14:04 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/srwikisource<nowiki>{</nowiki>.png;-1.5x.png;-2x.png<nowiki>}</nowiki> ([[phab:T310961|T310961]])
* 12:39 moritzm: installing expat security updates
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:34 mmandere: restart purged on cp60[12-14]
* 14:01 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: srwikisource: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 1/2) (duration: 03m 41s)
* 12:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker
* 14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)
* 13:58 urbanecm: UTC afternoon backport window is going to overflow by a couple of minutes
* 12:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)
* 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration
* 13:44 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)
* 13:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2044.codfw.wmnet with OS bullseye
* 12:09 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration
* 13:22 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 11:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:50 moritzm: installing openjdk-8 security updates for stretch
* 11:36 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 11:43 moritzm: uploaded openjdk-8 8u342-b07-1~deb9u1 for stretch-wikimedia
* 11:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32124 and previous config saved to /var/cache/conftool/dbconfig/20220801-102714-ladsgroup.json
* 11:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32123 and previous config saved to /var/cache/conftool/dbconfig/20220801-101208-ladsgroup.json
* 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 10:09 vgutierrez: test ATS 9.1.2 on cp6016 - [[phab:T309651|T309651]]
* 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 10:05 vgutierrez: test ATS 9.1.2 on cp6008 - [[phab:T309651|T309651]]
* 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 10:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@4da9195]: (no justification provided) (duration: 00m 19s)
* 11:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 10:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@4da9195]: (no justification provided)
* 11:27 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32122 and previous config saved to /var/cache/conftool/dbconfig/20220801-095702-ladsgroup.json
* 11:27 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:56 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 05s)
* 11:21 _joe_: restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.[[phab:T244843|T244843]]
* 09:56 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
* 11:18 _joe_: also removed the ipvsadm entry for apaches:80 [[phab:T244843|T244843]]
* 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32121 and previous config saved to /var/cache/conftool/dbconfig/20220801-094156-ladsgroup.json
* 11:17 jayme: rolled back linkrecommendation staging helm release to revision 12 - [[phab:T302744|T302744]]
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32120 and previous config saved to /var/cache/conftool/dbconfig/20220801-093845-ladsgroup.json
* 11:17 _joe_: restarting pybal on lvs1020 [[phab:T244843|T244843]]
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:11 _joe_: restarted pybal on lvs2009, [[phab:T244843|T244843]]
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:07 _joe_: restarted pybal on lvs2010, [[phab:T244843|T244843]]
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
* 11:02 mmandere: restart purged on cp60[09,10,11]
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Maintenance
* 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:47 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
* 09:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts
* 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
* 10:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts
* 09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts
* 09:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
* 10:39 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts
* 09:00 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
* 10:31 mmandere: restart purged on cp600[6-8]
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:05 vgutierrez: pool cp2039 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:48 elukey: elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)
* 08:53 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/includes/api: Backport: [[gerrit:818562{{!}}api: Support for links migration in ApiQueryBacklinks (T312865 T314112)]] (duration: 03m 01s)
* 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
* 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:33 _joe_: restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted
* 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:31 _joe_: restart pybal on lvs1020
* 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts
* 08:50 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
* 09:25 elukey: restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
* 09:25 _joe_: manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore
* 08:48 godog: thanos-be2004: copy quarantined and tmp off sdb3 and into sdb4 for analysis and to free space - [[phab:T314275|T314275]]
* 09:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts
* 08:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818998{{!}}Stop writing to the old templatelinks columns in itwikisource (T312865)]] (duration: 03m 12s)
* 09:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts
* 08:43 vgutierrez: rolling upgrade of HAProxy to version 2.4.18
* 09:22 _joe_: restarted pybal on lvs2009, the mw api is now effectively https-only in codfw [[phab:T287820|T287820]]
* 08:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:20 _joe_: restarted pybal on lvs2010
* 08:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
* 08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
* 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
* 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 09:06 elukey: restart purged on cp6005
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 08:57 elukey: restart purged on cp6004
* 08:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
* 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
* 08:14 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
* 08:27 urbanecm: UTC morning B&C window done
* 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 08:25 elukey: restart purged on cp6003
* 06:14 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appservers-ro
* 08:16 moritzm: drain instances off ganeti2008 for eventual decom
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appserver-ro
* 08:08 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: {{Gerrit|d149208dfd7e5fbf51f44dd0bf7dae3b2e2f5159}}: Use service-proxy to connect to linkrecommendation ([[phab:T302719|T302719]]) (duration: 00m 49s)
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=(appserver{{!}}api)-ro
* 07:59 elukey: restart purged on cp6002
* 05:43 moritzm: installing Linux 5.10.127-2 on Gitlab runners
* 06:58 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): [[phab:T302464|T302464]] test (duration: 00m 17s)
* 01:00 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ic0dbcba9f60f20a}} (duration: 03m 31s)
* 06:57 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): [[phab:T302464|T302464]] test
* 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:56 elukey: restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)
* 00:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:46 _joe_: uploaded scap 4.4.1 to <nowiki>{</nowiki>stretch,buster,bullseye<nowiki>}</nowiki> [[phab:T302464|T302464]]
* 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:46 _joe_: uploaded scap 4.4.1 to <nowiki>{</nowiki>stretch,buster,bullseye<nowiki>}</nowiki>
* 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
* 00:45 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|I9d363abd7cfef}} (duration: 03m 17s)
* 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json
* 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
==Archives ==
* 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' {{!}} grep grandpa  =>  "&ldquo;Wikipedia is like an all-knowing grandpa.&rdquo;" {{!}} [[phab:T300171|T300171]]
 
==Archives==
See [[Server Admin Log/Archives]].
See [[Server Admin Log/Archives]].
<noinclude>
<noinclude>

Latest revision as of 19:58, 7 August 2022

2022-08-07

  • 19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" | mwscript purgeList.php --wiki enwiki # T314712
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T312863)', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
  • 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json

2022-08-06

  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:02 krinkle@deploy1002: Synchronized w/: I9067d4 (duration: 03m 25s)
  • 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-05

  • 22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
  • 22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
  • 17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
  • 16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
  • 16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
  • 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
  • 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
  • 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
  • 16:26 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
  • 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS bullseye
  • 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1192.eqiad.wmnet with OS bullseye
  • 16:12 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@8489923]: T304954: Automate imagesuggestion imports (duration: 02m 03s)
  • 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
  • 16:11 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :) (duration: 06m 09s)
  • 16:10 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@8489923]: T304954: Automate imagesuggestion imports
  • 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
  • 16:07 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
  • 16:05 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :)
  • 16:04 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine (duration: 34m 38s)
  • 16:03 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
  • 15:55 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
  • 15:52 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bullseye
  • 15:51 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
  • 15:42 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bullseye
  • 15:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 15:30 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine
  • 15:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
  • 15:25 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
  • 15:24 jbond: upload trapperkeeper-metrics-clojure to puppet7 component
  • 15:22 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
  • 15:19 jbond: upload puppetlabs-http-client-clojur to puppet7 component
  • 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:14 dancy@deploy1002: Finished scap: Backport for gerrit:820653 scap gitignore: ignore all files under the `scap` directory (duration: 04m 41s)
  • 15:11 jbond: upload jolokia to puppet7 component
  • 15:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
  • 15:09 dancy@deploy1002: Started scap: Backport for gerrit:820653 scap gitignore: ignore all files under the `scap` directory
  • 15:09 jbond: upload test-chuck-clojure to puppet7 component
  • 15:05 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
  • 15:04 jbond: upload test-check-clojure to puppet7 component
  • 14:57 jbond: upload nippy-clojure to puppet7 component
  • 14:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
  • 14:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
  • 14:43 jbond: upload fressian to puppet7 component
  • 14:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
  • 14:40 jbond: upload test-generative-clojure to puppet7 component
  • 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 jbond: upload data-generators-clojure to puppet7 component
  • 14:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:23 jbond: upload encore-clojure to puppet7 component
  • 14:17 jbond: upload truss-clojure to puppet7 component
  • 14:13 jbond: upload structured-logging-clojure to puppet7 component
  • 14:06 jbond: upload murphy-clojure to puppet7 component
  • 13:57 jbond: upload logstash-logback-encoder-7.2 to puppet7 component
  • 13:49 jbond: upload kitchensink-clojure to puppet7 component
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts with fragile power supply (T314559 T314628)', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:09 sukhe: repool codfw
  • 13:02 jbond: upload honeysql-clojure to puppet7 component
  • 12:53 _joe_: progressive repool of services in codfw
  • 12:24 moritzm: installing nano bugfix updates from bullseye point release
  • 11:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on D3 (T310146)', diff saved to https://phabricator.wikimedia.org/P32291 and previous config saved to /var/cache/conftool/dbconfig/20220805-113729-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C6 (T310145)', diff saved to https://phabricator.wikimedia.org/P32290 and previous config saved to /var/cache/conftool/dbconfig/20220805-113555-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C5 (T310145)', diff saved to https://phabricator.wikimedia.org/P32289 and previous config saved to /var/cache/conftool/dbconfig/20220805-113436-ladsgroup.json
  • 10:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 10:12 Amir1: dbmaint at s4@codfw (T312863)
  • 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gerrit2002.wikimedia.org
  • 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.remove-downtime for gerrit2002.wikimedia.org
  • 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:18 mutante: restarting gerrit for config change - removing old replica T313250

2022-08-04

  • 23:07 mutante: switching gerrit-replica.wikimedia.org to new machine gerrit2002, dropping gerrit-replica-new.wikimedia.org T313250
  • 21:07 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:56 thcipriani@deploy1002: Finished scap: Backport for gerrit:819774 tkwiki: Update wordmark (duration: 06m 12s)
  • 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:51 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:50 thcipriani@deploy1002: Started scap: Backport for gerrit:819774 tkwiki: Update wordmark
  • 20:48 thcipriani@deploy1002: Finished scap: Backport for gerrit:812391 [config]: Add click event logging for mobile and desktop (duration: 39m 16s)
  • 20:45 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:24 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:23 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:22 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:13 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:10 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:09 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:08 thcipriani@deploy1002: Started scap: Backport for gerrit:812391 [config]: Add click event logging for mobile and desktop
  • 19:59 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 19:55 dancy@deploy1002: rebuilt and synchronized wikiversions files: resync
  • 19:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-be2001.codfw.wmnet
  • 19:49 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for thanos-be2001.codfw.wmnet
  • 19:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 8 hosts
  • 19:44 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 8 hosts
  • 19:42 Emperor: rebooting thanos-be2001 to fix drive ordering
  • 19:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2071.codfw.wmnet
  • 19:37 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2071.codfw.wmnet
  • 19:31 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: T310146
  • 19:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: T310146
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:12 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 19:11 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 19:11 dancy: There were many errors during php-fpm restart due to failure to contact http://lvs2009:9090/pools/appservers-https_443/mw2361.codfw.wmnet and the like.
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.23 refs T308076
  • 19:09 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 19:09 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 19:05 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 19:04 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 19:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 19:03 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 19:03 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 19:02 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
  • 19:02 ottomata: roll-restarting eventgate-analytics-external to pick up backwards incompatible schema change - T314151
  • 18:47 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 18:46 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 18:41 cwhite: poweroff kafka-logging2003 - T310145
  • 18:39 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw237[0-6].codfw.wmnet
  • 18:39 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts
  • 18:39 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for 7 hosts
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2369.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2369.codfw.wmnet
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2368.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2368.codfw.wmnet
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2367.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2367.codfw.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2369.codfw.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2368.codfw.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2367.codfw.wmnet
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2366.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2366.codfw.wmnet
  • 18:34 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2366.codfw.wmnet
  • 18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2279.codfw.wmnet
  • 18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2278.codfw.wmnet
  • 18:29 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2277.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2276.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2276.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2275.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2275.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2274.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2274.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2273.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2273.codfw.wmnet
  • 18:26 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 02m 39s)
  • 18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2272.codfw.wmnet
  • 18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2272.codfw.wmnet
  • 18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2271.codfw.wmnet
  • 18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2271.codfw.wmnet
  • 18:23 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:23 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 32s)
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2276.codfw.wmnet
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2275.codfw.wmnet
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2274.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2273.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
  • 18:22 Emperor: shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,68].codfw.wmnet PDU work T310145
  • 18:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:21 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
  • 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
  • 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:20 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 49s)
  • 18:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
  • 18:20 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
  • 18:19 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:14 mutante: mw2272 and upwards: scap pull, checking monitoring, repooling.. one by one
  • 18:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 18:12 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 51s)
  • 18:11 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:06 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 54s)
  • 18:04 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:43 mutante: maps2008 - downtime and shutdown for D3 maintenance
  • 17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
  • 17:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
  • 17:42 mutante: thunmbor2006 - downtime and shutdown for D3 maintenance
  • 17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
  • 17:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
  • 17:39 mutante: mw2386 - systemctl reset-failed
  • 17:31 mutante: phab2001 - systemctl restart ssh-phab, attempting to clear Icinga pybal alerts, related to reboots
  • 17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:28 Amir1: dbmaint at s4@eqiad (T312863)
  • 17:26 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:26 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:20 mutante: [an-launcher1002:~] $ sudo systemctl reset-failed
  • 17:20 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=ms-fe2012.codfw.wmnet
  • 17:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=varnish-fe
  • 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-be
  • 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-tls
  • 17:16 Emperor: shutdown of moss-fe2002.codfw.wmnet,ms-be20[37,38,43,61,65,69].codfw.wmnet,ms-fe2012.codfw.wmnet,thanos-fe2003.codfw.wmnet for power work T310146
  • 17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: PDU work
  • 17:15 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: PDU work
  • 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
  • 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
  • 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
  • 17:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
  • 17:13 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
  • 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=varnish-fe
  • 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-be
  • 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-tls
  • 17:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: T310146
  • 17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: T310146
  • 17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 00m 04s)
  • 17:11 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
  • 17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
  • 17:10 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
  • 17:10 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 15s)
  • 17:09 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 17:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
  • 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
  • 16:51 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 07m 14s)
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[05].codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase202[05].codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
  • 16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
  • 16:43 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
  • 16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
  • 16:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 18 hosts
  • 16:37 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 18 hosts
  • 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: T310145
  • 16:35 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: T310145
  • 16:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
  • 16:34 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 20s)
  • 16:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
  • 16:34 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 16:32 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 29m 59s)
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D3 for PDU maint', diff saved to https://phabricator.wikimedia.org/P32286 and previous config saved to /var/cache/conftool/dbconfig/20220804-163037-ladsgroup.json
  • 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Start reading from new templatelinks columns in commons (T306673) (duration: 03m 00s)
  • 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:17 brett: deploying authdns - geodns: Map out African countries by DC latency (T311472)
  • 16:12 cwhite: poweroff logstash2028 - T310145
  • 16:06 Emperor: shutdown ms-be20[39,49,54].codfw.wmnet,thanos-be2003 for PDU swap T310145
  • 16:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
  • 16:02 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
  • 16:02 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 15:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: T310145
  • 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: T310145
  • 15:43 damilare: payments-wiki upgraded from 0e4a5b3b to 6880236d
  • 15:37 _joe_: uncordoning ml-serve200{1,6}
  • 15:27 sukhe: power off cp2037,cp2038: PDU upgrade
  • 15:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
  • 15:25 jelto: power off phab2001
  • 15:25 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=varnish-fe
  • 15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-be
  • 15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-tls
  • 15:21 XioNoX: un-drain codfw-ulsfo link - T310310
  • 15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance (T310145)
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance (T310145)
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool C6 for PDU maint (T310145)', diff saved to https://phabricator.wikimedia.org/P32285 and previous config saved to /var/cache/conftool/dbconfig/20220804-151958-ladsgroup.json
  • 15:16 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 15:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
  • 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
  • 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance (T310145)
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance (T310145)
  • 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=varnish-fe
  • 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-be
  • 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-tls
  • 15:12 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2058,2064].codfw.wmnet
  • 15:12 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2058,2064].codfw.wmnet
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts for PDU maint (T310145)', diff saved to https://phabricator.wikimedia.org/P32284 and previous config saved to /var/cache/conftool/dbconfig/20220804-151121-ladsgroup.json
  • 15:09 godog: poweroff logstash2002 - T310145
  • 15:07 _joe_: pwoering down mc203{0,1}
  • 15:07 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
  • 15:06 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
  • 15:05 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:58 jelto: power off mc20[30-31]
  • 14:56 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
  • 14:56 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
  • 14:56 XioNoX: draining codfw-ulsfo link - T310310
  • 14:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2025.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2020.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2016.codfw.wmnet
  • 14:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: T310145
  • 14:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: T310145
  • 14:25 jelto: power off gitlab-runner2003
  • 14:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
  • 14:25 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: T310145
  • 14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: T310145
  • 14:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
  • 14:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: T310145
  • 14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: T310145
  • 14:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
  • 14:22 godog: poweroff logstash2035 - T310145
  • 14:22 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
  • 14:21 Emperor: shutdown ms-be20[58,64].codfw.wmnet for PDU swap T310145
  • 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:14 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Remove unused $wgMathUseRestBase (T274436) (duration: 03m 01s)
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: CommonSettings-labs: Fix usage of $wgSFSValidateIPListLocationMD5 (duration: 02m 51s)
  • 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: T310145
  • 14:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: T310145
  • 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: wikitech: Remove old LDAP config vars (duration: 02m 54s)
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
  • 13:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
  • 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove unused $wgIncludejQueryMigrate (T280944) (2/2) (duration: 03m 03s)
  • 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:45 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgIncludejQueryMigrate (T280944) (1/2) (duration: 02m 58s)
  • 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:40 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: T310145
  • 13:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: T310145
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove unused $wgLegacyJavaScriptGlobals (T72470) (2/2) (duration: 02m 59s)
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgLegacyJavaScriptGlobals (T72470) (1/2) (duration: 02m 58s)
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForSDC.php: Config: Remove unused $wgWBCSEnableDispatchingQueryBuilder (duration: 03m 01s)
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:17 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove unused CA P3P config (duration: 03m 09s)
  • 13:14 jbond: intorudce new puppetmaster backends puppetmaster[12]004
  • 13:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: T310145
  • 13:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: T310145
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: QuickSurveys: Deploy research incentive survey to Bengali wiki (T314333) (duration: 03m 26s)
  • 13:07 moritzm: installing jetty9 security updates
  • 12:48 moritzm: installing Linux 4.19.249 kernels on Buster hosts
  • 12:03 jbond: send sretest100[12] and idp-test2001 to the new puppetmaster[12]004 servers to test
  • 11:46 moritzm: installing Linux 5.10.127-2 kernels on Bullseye hosts
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2017.codfw.wmnet to cluster codfw and group D
  • 11:41 moritzm: installing libpgjava security updates
  • 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2017.codfw.wmnet to cluster codfw and group D
  • 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2017.codfw.wmnet with OS bullseye
  • 10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet
  • 10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2017.codfw.wmnet with OS bullseye
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 10:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 10:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:00:00 on 32 hosts with reason: PDU swap
  • 10:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 9:00:00 on 32 hosts with reason: PDU swap
  • 10:03 Lucas_WMDE: stashbot temporarily parted and lost several logs between 9:42 UTC and 9:49 UTC; mainly mwdebug helmfil start/done, also ayounsi sre.deploy.python-code cookbook to cumin1001, cumin2002; see IRC logs
  • 10:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
  • 10:00 jynus: stop db2099 T310145
  • 10:00 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
  • 09:39 jelto: power off mw22[71-79].codfw.wmnet
  • 09:38 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/includes/EventLogging/SpecialEditGrowthConfigLogger.php: ba67dd9: SpecialEditGrowthConfigLogger: Update schema version (T314173, T312148) (duration: 03m 18s)
  • 09:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2177 to s3 T311494', diff saved to https://phabricator.wikimedia.org/P32282 and previous config saved to /var/cache/conftool/dbconfig/20220804-093704-marostegui.json
  • 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ddcd333: testwiki: Growth: Assign enrollasmentor to * (T310905) (duration: 03m 41s)
  • 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:32 jelto: set/pooled=inactive mw22[71-79].codfw.wmnet
  • 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:30:00 on 9 hosts with reason: PDU swap
  • 09:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 9:30:00 on 9 hosts with reason: PDU swap
  • 09:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
  • 09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2089.codfw.wmnet
  • 09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0614a39: testwiki: Growth: Switch to structured mentor list (T310905) (duration: 03m 38s)
  • 09:25 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
  • 09:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:23 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 09:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2089.codfw.wmnet
  • 09:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes2022.codfw.wmnet
  • 09:03 oblivian@mwmaint1002: pull aborted: (duration: 00m 06s)
  • 08:58 moritzm: installing gsasl security updates
  • 08:57 oblivian@mwmaint1002: pull aborted: (duration: 00m 18s)
  • 08:48 moritzm: draining ganeti2017 T311686
  • 08:45 jelto: power off kubernetes2022
  • 08:43 oblivian@deploy1002: Synchronized README: testing new scap configuration (duration: 03m 18s)
  • 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
  • 08:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
  • 08:37 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2022.codfw.wmnet
  • 08:35 jelto: kubectl drain kubernetes2022.codfw.wmnet
  • 08:32 jelto: kubectl cordon kubernetes2022.codfw.wmnet
  • 08:28 moritzm: imported gsasl 1.8.0-8+wmf1 to stretch-wikimedia
  • 08:26 jelto: power off mc2049 and mc2050
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
  • 08:22 oblivian@mwmaint1002: pull aborted: (duration: 00m 11s)
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132, db111, db1127, db1143', diff saved to https://phabricator.wikimedia.org/P32281 and previous config saved to /var/cache/conftool/dbconfig/20220804-081958-root.json
  • 08:19 jelto: power off mc2047 and mc2048
  • 08:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
  • 08:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
  • 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:55 marostegui: Remove grants for 208.80.154.160/208.80.155.109 T314528
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2089 from dbctl T313799', diff saved to https://phabricator.wikimedia.org/P32280 and previous config saved to /var/cache/conftool/dbconfig/20220804-074957-marostegui.json
  • 07:47 godog: grow sda/sdb 3 by 100G on thanos-be2002 - T314275
  • 07:46 godog: grow sda/sdb 3 by 100G on thanos-be1003 - T314275
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 07:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 06:58 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 06:06 _joe_: restarted memcached on mc2038 to pick up the actual production configuration
  • 05:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2030.codfw.wmnet with OS bullseye
  • 05:49 kart_: Updated cxserver to 2022-08-04-022612-production (T313296, T308248)
  • 05:44 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:43 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
  • 05:39 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:38 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
  • 05:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2030.codfw.wmnet with OS bullseye
  • 05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 04:38 ejegg: payments-wiki upgraded from 712df4ce to 0e4a5b3b
  • 04:29 TimStarling: on mw2377 fiddling with CPU frequency control and doing benchmarks
  • 04:09 krinkle@mwmaint1002: pull aborted: (duration: 00m 05s)
  • 01:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32278 and previous config saved to /var/cache/conftool/dbconfig/20220804-012341-marostegui.json
  • 01:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32277 and previous config saved to /var/cache/conftool/dbconfig/20220804-010834-marostegui.json
  • 00:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32276 and previous config saved to /var/cache/conftool/dbconfig/20220804-005328-marostegui.json
  • 00:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32275 and previous config saved to /var/cache/conftool/dbconfig/20220804-003822-marostegui.json
  • 00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32274 and previous config saved to /var/cache/conftool/dbconfig/20220804-003611-marostegui.json
  • 00:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32273 and previous config saved to /var/cache/conftool/dbconfig/20220804-003549-marostegui.json
  • 00:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32272 and previous config saved to /var/cache/conftool/dbconfig/20220804-002043-marostegui.json
  • 00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started.. T313250
  • 00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started... [CONTEXT pushOneId="83ad5008" ]
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32271 and previous config saved to /var/cache/conftool/dbconfig/20220804-000536-marostegui.json
  • 00:03 mutante: gerrit - service restart to deploy config change to add second replica T313250
  • 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit.wikimedia.org with reason: service restart
  • 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit.wikimedia.org with reason: service restart
  • 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart

2022-08-03

  • 23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
  • 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32270 and previous config saved to /var/cache/conftool/dbconfig/20220803-235030-marostegui.json
  • 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32269 and previous config saved to /var/cache/conftool/dbconfig/20220803-225015-marostegui.json
  • 22:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
  • 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32268 and previous config saved to /var/cache/conftool/dbconfig/20220803-224827-marostegui.json
  • 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32267 and previous config saved to /var/cache/conftool/dbconfig/20220803-223321-marostegui.json
  • 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32266 and previous config saved to /var/cache/conftool/dbconfig/20220803-221815-marostegui.json
  • 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32265 and previous config saved to /var/cache/conftool/dbconfig/20220803-220309-marostegui.json
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32264 and previous config saved to /var/cache/conftool/dbconfig/20220803-220057-marostegui.json
  • 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32263 and previous config saved to /var/cache/conftool/dbconfig/20220803-220007-marostegui.json
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32262 and previous config saved to /var/cache/conftool/dbconfig/20220803-214501-marostegui.json
  • 21:44 damilare: payments-wiki updated from e1b6036a to 712df4ce
  • 21:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - T314078
  • 21:35 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 21:35 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 21:30 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 21:30 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32261 and previous config saved to /var/cache/conftool/dbconfig/20220803-212955-marostegui.json
  • 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32260 and previous config saved to /var/cache/conftool/dbconfig/20220803-211449-marostegui.json
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32259 and previous config saved to /var/cache/conftool/dbconfig/20220803-211237-marostegui.json
  • 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32258 and previous config saved to /var/cache/conftool/dbconfig/20220803-211216-marostegui.json
  • 21:03 ejegg: updated standalone SmashPig deployment from 8e8f0017 to 9b97ea15
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32257 and previous config saved to /var/cache/conftool/dbconfig/20220803-205710-marostegui.json
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 ebernhardson@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: cirrus: Set ElasticaWrite partition count for cloudelastic to 3 (duration: 03m 29s)
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:43 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/includes/VisualEditorParsoidClient.php: a804fe1: Update call to PageConfigFactory::create to use new signature (T314523) (duration: 03m 25s)
  • 20:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32256 and previous config saved to /var/cache/conftool/dbconfig/20220803-204204-marostegui.json
  • 20:39 urbanecm@deploy1002: sync-file aborted: a804fe1: Update call to PageConfigFactory::create to use new signature (T314523ú (duration: 00m 00s)
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/: b840eef: Fix ReplyLinksController#teardown (duration: 03m 27s)
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/: 70a18f5: Add explicit partitioning key to ElasticaWrite (T314426) (duration: 03m 13s)
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/CirrusSearch/: 9961e9b: Add explicit partitioning key to ElasticaWrite (T314426) (duration: 03m 23s)
  • 20:28 cwhite@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host logstash2032.codfw.wmnet
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32255 and previous config saved to /var/cache/conftool/dbconfig/20220803-202658-marostegui.json
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32254 and previous config saved to /var/cache/conftool/dbconfig/20220803-202146-marostegui.json
  • 20:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 20:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32253 and previous config saved to /var/cache/conftool/dbconfig/20220803-202125-marostegui.json
  • 20:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 20:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 195f809: Start writing to cuc_actor on test wikis (T233004) (duration: 03m 27s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2032.codfw.wmnet on all recursors
  • 20:08 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2032.codfw.wmnet on all recursors
  • 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 mutante: gerrit - adding second replica T313250
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32252 and previous config saved to /var/cache/conftool/dbconfig/20220803-200619-marostegui.json
  • 20:04 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 20:03 cwhite@cumin2002: START - Cookbook sre.ganeti.makevm for new host logstash2032.codfw.wmnet
  • 20:00 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2012.codfw.wmnet
  • 20:00 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2012.codfw.wmnet
  • 20:00 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2012.codfw.wmnet
  • 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32251 and previous config saved to /var/cache/conftool/dbconfig/20220803-195113-marostegui.json
  • 19:40 ryankemper: T314078 Forgot to mention, restart is at `ryankemper@cumin1001` tmux session `codfw_restarts`
  • 19:39 ryankemper: T314078 Rolling upgrade of codfw hosts; after this all of eqiad/codfw will have the new plugin version and we can resume the `search-loader` instances: `sudo -E cookbook sre.elasticsearch.rolling-operation search_codfw "codfw cluster plugin upgrade" --upgrade --nodes-per-run 3 --start-datetime 2022-08-03T19:38:10 --task-id T314078`
  • 19:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - T314078
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32250 and previous config saved to /var/cache/conftool/dbconfig/20220803-193607-marostegui.json
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32249 and previous config saved to /var/cache/conftool/dbconfig/20220803-193354-marostegui.json
  • 19:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312972)', diff saved to https://phabricator.wikimedia.org/P32248 and previous config saved to /var/cache/conftool/dbconfig/20220803-193334-marostegui.json
  • 19:25 mutante: gerrit1001 - rsyncing /var/lib/gerrit/review_site/ over to gerrit2002 815401
  • 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32247 and previous config saved to /var/cache/conftool/dbconfig/20220803-191828-marostegui.json
  • 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32246 and previous config saved to /var/cache/conftool/dbconfig/20220803-190321-marostegui.json
  • 18:56 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2011.codfw.wmnet
  • 18:56 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2011.codfw.wmnet
  • 18:56 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2011.codfw.wmnet
  • 18:33 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2027,2037].codfw.wmnet
  • 18:33 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2027,2037].codfw.wmnet
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23 refs T308076 (duration: 03m 37s)
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23 refs T308076
  • 17:58 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage2002.codfw.wmnet
  • 17:58 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubestage2002.codfw.wmnet
  • 17:57 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2025-2026].codfw.wmnet
  • 17:57 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2025-2026].codfw.wmnet
  • 17:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2044.codfw.wmnet
  • 17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2044.codfw.wmnet
  • 17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2043.codfw.wmnet
  • 17:56 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2043.codfw.wmnet
  • 17:55 ottomata: increasing partitions from 5 to 6 for *.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite topics in Kafka main-eqiad and main-codfw - T314426
  • 17:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2055.codfw.wmnet
  • 17:55 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2055.codfw.wmnet
  • 17:50 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=kubestage2002.codfw.wmnet
  • 17:38 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2008-2010].codfw.wmnet
  • 17:38 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2008-2010].codfw.wmnet
  • 17:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase20[12]4.codfw.wmnet
  • 17:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 17:14 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 17:08 ryankemper: T310145 `elastic2031` and `wcqs2002` powered off in preparation for C1 maintenance
  • 17:06 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=(kubernetes2020.codfw.wmnet|kubernetes2009.codfw.wmnet|kubernetes2010.codfw.wmnet)
  • 17:00 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:48 Emperor: shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,55,68].codfw.wmnet PDU work T310145
  • 16:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
  • 16:47 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
  • 16:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
  • 16:46 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
  • 16:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2046.codfw.wmnet
  • 16:40 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2046.codfw.wmnet
  • 16:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 10 hosts
  • 16:39 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 10 hosts
  • 16:38 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2023.codfw.wmnet
  • 16:38 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2023.codfw.wmnet
  • 16:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
  • 16:37 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
  • 16:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
  • 16:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
  • 16:32 jelto: power off mc2025-2026
  • 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for rdb2008.codfw.wmnet
  • 16:30 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for rdb2008.codfw.wmnet
  • 16:28 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2009-2010,2020].codfw.wmnet
  • 16:27 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2009-2010,2020].codfw.wmnet
  • 16:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 12 hosts
  • 16:11 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for 12 hosts
  • 16:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
  • 16:08 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 15 hosts
  • 16:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs[2005-2008].codfw.wmnet
  • 16:08 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs[2005-2008].codfw.wmnet
  • 15:59 Emperor: shutdown ms-be20[33,47],thanos-be2002 prior to PDU work T310070
  • 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
  • 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
  • 15:52 jelto: pooling mw2259-2270 again
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32242 and previous config saved to /var/cache/conftool/dbconfig/20220803-154515-marostegui.json
  • 15:38 vgutierrez: clearing ats-be cache on cp6008 - T309651
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:36 elukey: powercycle kafka-logging2003 - not responsive to serial console
  • 15:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: 4438957: ServiceImageRecommendationProvider: Add extra logging when no JSON response received (T313973) (duration: 03m 04s)
  • 15:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
  • 15:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
  • 15:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
  • 15:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
  • 15:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
  • 15:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2024.codfw.wmnet
  • 15:30 vgutierrez: clearing ats-be cache on cp6016 - T309651
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32241 and previous config saved to /var/cache/conftool/dbconfig/20220803-153009-marostegui.json
  • 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
  • 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
  • 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
  • 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
  • 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
  • 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
  • 15:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
  • 15:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: T310070
  • 15:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: T310070
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32240 and previous config saved to /var/cache/conftool/dbconfig/20220803-151502-marostegui.json
  • 15:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
  • 15:10 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
  • 15:04 jelto: power off mc2023
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32239 and previous config saved to /var/cache/conftool/dbconfig/20220803-145956-marostegui.json
  • 14:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
  • 14:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32238 and previous config saved to /var/cache/conftool/dbconfig/20220803-145849-marostegui.json
  • 14:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32237 and previous config saved to /var/cache/conftool/dbconfig/20220803-145828-marostegui.json
  • 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:53 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.19 (duration: 05m 37s)
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:47 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.21 (duration: 06m 13s)
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32236 and previous config saved to /var/cache/conftool/dbconfig/20220803-144322-marostegui.json
  • 14:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: T310070
  • 14:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: T310070
  • 14:32 Emperor: shutdown aqs200[5-8] prior to PDU work T310070
  • 14:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
  • 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
  • 14:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
  • 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
  • 14:28 jelto: power off thumbor2003 and thumbor2004
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32235 and previous config saved to /var/cache/conftool/dbconfig/20220803-142816-marostegui.json
  • 14:27 moritzm: upgrading ganeti/esams to Ganeti 3.0.2 T312637
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32234 and previous config saved to /var/cache/conftool/dbconfig/20220803-141310-marostegui.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32233 and previous config saved to /var/cache/conftool/dbconfig/20220803-141103-marostegui.json
  • 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32232 and previous config saved to /var/cache/conftool/dbconfig/20220803-141042-marostegui.json
  • 14:06 moritzm: installing freetype security updates on bullseye
  • 13:57 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin 'P{R:Class = Confd}' 'systemctl restart confd'
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32231 and previous config saved to /var/cache/conftool/dbconfig/20220803-135536-marostegui.json
  • 13:46 cdanis: ✔️ cdanis@deploy1002.eqiad.wmnet ~ 🕙☕ sudo systemctl restart confd
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32230 and previous config saved to /var/cache/conftool/dbconfig/20220803-134030-marostegui.json
  • 13:30 moritzm: installing Java 8 security updates for Buster
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32229 and previous config saved to /var/cache/conftool/dbconfig/20220803-132524-marostegui.json
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32228 and previous config saved to /var/cache/conftool/dbconfig/20220803-131916-marostegui.json
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32227 and previous config saved to /var/cache/conftool/dbconfig/20220803-131855-marostegui.json
  • 13:18 sukhe: depool codfw for PDU upgrade: CR 819798
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:16 urbanecm@deploy1002: Synchronized wmf-config/MetaContactPages.php: f89f02e: Amend license request contact form per Legal (T303359) (duration: 09m 27s)
  • 13:12 jbond: introduce puppetmaster[12]004 for now as offline
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
  • 13:09 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
  • 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: T310070
  • 13:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: T310070
  • 13:04 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32226 and previous config saved to /var/cache/conftool/dbconfig/20220803-130348-marostegui.json
  • 12:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: T310070
  • 12:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: T310070
  • 12:56 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32224 and previous config saved to /var/cache/conftool/dbconfig/20220803-124842-marostegui.json
  • 12:40 moritzm: uploaded openjdk-8 8u342-b07-1~deb10u1 to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security update)
  • 12:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32223 and previous config saved to /var/cache/conftool/dbconfig/20220803-123336-marostegui.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32222 and previous config saved to /var/cache/conftool/dbconfig/20220803-122929-marostegui.json
  • 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32221 and previous config saved to /var/cache/conftool/dbconfig/20220803-122819-marostegui.json
  • 12:16 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@614f7b2]: (no justification provided) (duration: 00m 11s)
  • 12:16 ebysans@deploy1002: Started deploy [airflow-dags/analytics@614f7b2]: (no justification provided)
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32220 and previous config saved to /var/cache/conftool/dbconfig/20220803-121313-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32219 and previous config saved to /var/cache/conftool/dbconfig/20220803-115807-marostegui.json
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2176 to s1 T311494', diff saved to https://phabricator.wikimedia.org/P32218 and previous config saved to /var/cache/conftool/dbconfig/20220803-115706-marostegui.json
  • 11:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, T310145
  • 11:49 root@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, T310145
  • 11:46 jayme@cumin1001: conftool action : set/weight=10; selector: name=(kubernetes2019.codfw.wmnet|kubernetes2021.codfw.wmnet|kubernetes2022.codfw.wmnet|kubernetes2018.codfw.wmnet|kubernetes2020.codfw.wmnet)
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32217 and previous config saved to /var/cache/conftool/dbconfig/20220803-114301-marostegui.json
  • 11:41 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=(kubernetes2020.codfw.wmnet|kubernetes2009.codfw.wmnet|kubernetes2010.codfw.wmnet|kubernetes2011.codfw.wmnet|kubernetes2012.codfw.wmnet|kubestage2002.codfw.wmnet)
  • 11:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase2022.codfw.wmnet
  • 11:37 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
  • 11:35 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:32 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 11:26 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wdqs
  • 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=kartotherian
  • 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-backend
  • 11:21 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
  • 11:17 _joe_: depooling codfw services from all traffic
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2011.codfw.wmnet to cluster codfw and group C
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2011.codfw.wmnet to cluster codfw and group C
  • 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 10:47 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
  • 10:46 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32216 and previous config saved to /var/cache/conftool/dbconfig/20220803-104246-marostegui.json
  • 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32215 and previous config saved to /var/cache/conftool/dbconfig/20220803-104224-marostegui.json
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 10:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase201[45].codfw.wmnet
  • 10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2022.codfw.wmnet
  • 10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
  • 10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
  • 10:37 jelto: shutdown kubestage2002 kubernetes2020 kubernetes2009 kubernetes2010 kubernetes2011 kubernetes2012
  • 10:30 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
  • 10:30 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
  • 10:29 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 10:29 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32213 and previous config saved to /var/cache/conftool/dbconfig/20220803-102718-marostegui.json
  • 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2012.codfw.wmnet
  • 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2011.codfw.wmnet
  • 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
  • 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2009.codfw.wmnet
  • 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2020.codfw.wmnet
  • 10:20 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubestage2002.codfw.wmnet
  • 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2011.codfw.wmnet with OS bullseye
  • 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
  • 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32212 and previous config saved to /var/cache/conftool/dbconfig/20220803-101212-marostegui.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32211 and previous config saved to /var/cache/conftool/dbconfig/20220803-095706-marostegui.json
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
  • 09:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2021.codfw.wmnet
  • 09:56 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2012.codfw.wmnet
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32210 and previous config saved to /var/cache/conftool/dbconfig/20220803-095559-marostegui.json
  • 09:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 09:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32209 and previous config saved to /var/cache/conftool/dbconfig/20220803-095538-marostegui.json
  • 09:55 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase2027.codfw.wmnet
  • 09:54 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2011.codfw.wmnet
  • 09:54 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:54 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
  • 09:52 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2010.codfw.wmnet
  • 09:50 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2009.codfw.wmnet
  • 09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
  • 09:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
  • 09:47 jelto: kubectl drain --ignore-daemonsets kubernetes2020.codfw.wmnet
  • 09:46 jelto: kubectl cordon kubernetes2020.codfw.wmnet kubernetes2009.codfw.wmnet kubernetes2010.codfw.wmnet kubernetes2011.codfw.wmnet kubernetes2012.codfw.wmnet
  • 09:43 jelto: kubectl drain --ignore-daemonsets kubestage2002.codfw.wmnet
  • 09:43 vgutierrez: rolling restart of pybal in codfw lvs instances - T310070
  • 09:42 jelto: kubectl cordon kubestage2002
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32208 and previous config saved to /var/cache/conftool/dbconfig/20220803-094032-marostegui.json
  • 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS bullseye
  • 09:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@674bb8b]: (no justification provided) (duration: 00m 10s)
  • 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2090.codfw.wmnet
  • 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:33 ebysans@deploy1002: Started deploy [airflow-dags/analytics@674bb8b]: (no justification provided)
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 09:29 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2090.codfw.wmnet
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32207 and previous config saved to /var/cache/conftool/dbconfig/20220803-092525-marostegui.json
  • 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:23 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:23 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:22 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
  • 09:22 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2090 from dbctl T314109', diff saved to https://phabricator.wikimedia.org/P32206 and previous config saved to /var/cache/conftool/dbconfig/20220803-092053-marostegui.json
  • 09:20 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
  • 09:15 jelto: power on mc2024
  • 09:10 XioNoX: configure BGP on the esams-drmrs link - T307221
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32205 and previous config saved to /var/cache/conftool/dbconfig/20220803-091019-marostegui.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32204 and previous config saved to /var/cache/conftool/dbconfig/20220803-090912-marostegui.json
  • 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32203 and previous config saved to /var/cache/conftool/dbconfig/20220803-090836-marostegui.json
  • 09:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
  • 09:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
  • 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 09:04 jynus: stop backup2006 backup2009 for T310070
  • 09:00 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
  • 09:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
  • 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
  • 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
  • 08:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 08:58 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
  • 08:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
  • 08:57 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 08:57 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 08:54 XioNoX: put the esams-drmrs link in service - T307221
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32202 and previous config saved to /var/cache/conftool/dbconfig/20220803-085330-marostegui.json
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 08:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:47 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32201 and previous config saved to /var/cache/conftool/dbconfig/20220803-083824-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32200 and previous config saved to /var/cache/conftool/dbconfig/20220803-082318-marostegui.json
  • 08:19 jynus: stop db2098 for T310070
  • 08:17 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
  • 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2072.codfw.wmnet
  • 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2072.codfw.wmnet
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2072 from dbctl T313911', diff saved to https://phabricator.wikimedia.org/P32199 and previous config saved to /var/cache/conftool/dbconfig/20220803-074806-marostegui.json
  • 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32197 and previous config saved to /var/cache/conftool/dbconfig/20220803-072253-marostegui.json
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32196 and previous config saved to /var/cache/conftool/dbconfig/20220803-072214-marostegui.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
  • 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
  • 07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: CX: Set MT threshold for publishing in Armenian WP to 80% (T313208) (duration: 03m 49s)
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32195 and previous config saved to /var/cache/conftool/dbconfig/20220803-070708-marostegui.json
  • 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:00 moritzm: draining ganeti2011 T311686
  • 06:56 godog: grow sda/sdb 3 by 100G on thanos-be2003 - T314275
  • 06:56 godog: grow sda/sdb 3 by 100G on thanos-be1002 - T314275
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32194 and previous config saved to /var/cache/conftool/dbconfig/20220803-065202-marostegui.json
  • 06:46 godog: power up centrallog2002 and prometheus2005 - T310070
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 06:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32193 and previous config saved to /var/cache/conftool/dbconfig/20220803-063656-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32192 and previous config saved to /var/cache/conftool/dbconfig/20220803-063148-marostegui.json
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32191 and previous config saved to /var/cache/conftool/dbconfig/20220803-063045-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32190 and previous config saved to /var/cache/conftool/dbconfig/20220803-061538-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32189 and previous config saved to /var/cache/conftool/dbconfig/20220803-060032-marostegui.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32188 and previous config saved to /var/cache/conftool/dbconfig/20220803-054526-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32187 and previous config saved to /var/cache/conftool/dbconfig/20220803-054106-marostegui.json
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance

2022-08-02

  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site /home) again after gerrit2002 was reimaged with buster T313250 T313972
  • 22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
  • 22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
  • 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23 refs T308076
  • 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:29 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/includes/Sanity/Checker.php: Backport: Fix appending of join conds (T312421 T314439) (duration: 03m 15s)
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:27 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - T314078
  • 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS buster
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22 refs T308076
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 20:50 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23 refs T308076
  • 20:38 mutante: re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise T313250 T243027 T279509
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS buster
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:36 urbanecm: UTC evening B&C window done
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/HTMLTransformInput.php: 69e9152: ParsoidHandler: fix page bundle input with no orig HTML (duration: 03m 22s)
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:29 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/ParsoidHandler.php: 322a960: ParsoidHandler: pass metrics object to HTMLTransformInput (duration: 03m 19s)
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5fac0aa: GrowthExperiments: Remove wgGEHomepageTutorialTitle (duration: 03m 26s)
  • 20:06 dancy@deploy1002: Finished scap: Backport for gerrit:819612 Revert "Bump wikimedia/parsoid to 0.16.0-a18" (duration: 11m 30s)
  • 20:01 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 05s)
  • 20:01 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
  • 19:59 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 01s)
  • 19:59 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
  • 19:55 dancy@deploy1002: Started scap: Backport for gerrit:819612 Revert "Bump wikimedia/parsoid to 0.16.0-a18"
  • 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-tls
  • 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=varnish-fe
  • 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-tls
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=varnish-fe
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
  • 19:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2041,2046].codfw.wmnet
  • 19:35 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2041,2046].codfw.wmnet
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-fe2002.codfw.wmnet
  • 19:28 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for thanos-fe2002.codfw.wmnet
  • 19:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe2010.codfw.wmnet
  • 19:26 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe2010.codfw.wmnet
  • 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-tls
  • 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=varnish-fe
  • 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-be
  • 19:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
  • 19:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
  • 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-tls
  • 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=varnish-fe
  • 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 19:11 mutante: gerrit1001 - rsyncing /home/ to gerrit2002:/srv/home-gerrit1001.wikimedia.org T313250
  • 19:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
  • 19:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
  • 18:55 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.23 refs T308076 (duration: 50m 39s)
  • 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:52 ejegg: updated payments-wiki from 589bb64e to e1b6036a (just i18n changes in extensions)
  • 18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - T314078
  • 18:46 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:45 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:41 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2038.codfw.wmnet
  • 18:41 rzl@cumin2002: START - Cookbook sre.hosts.remove-downtime for mc2038.codfw.wmnet
  • 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:18 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:18 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:17 rzl@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2038.codfw.wmnet with reason: install
  • 18:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2038.codfw.wmnet with reason: install
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
  • 18:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:04 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.23 refs T308076
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32185 and previous config saved to /var/cache/conftool/dbconfig/20220802-175233-marostegui.json
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2159', diff saved to https://phabricator.wikimedia.org/P32184 and previous config saved to /var/cache/conftool/dbconfig/20220802-174311-ladsgroup.json
  • 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32183 and previous config saved to /var/cache/conftool/dbconfig/20220802-173723-marostegui.json
  • 17:35 moritzm: installing node-moment security updates
  • 17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: T310070
  • 17:32 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: T310070
  • 17:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
  • 17:25 moritzm: installing fribidi security updates
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32182 and previous config saved to /var/cache/conftool/dbconfig/20220802-172217-marostegui.json
  • 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-tls
  • 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=varnish-fe
  • 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
  • 17:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32181 and previous config saved to /var/cache/conftool/dbconfig/20220802-170711-marostegui.json
  • 17:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:05 Emperor: ms-be20[31,32,41,46].codfw.wmnet,ms-fe2010.codfw.wmnet,thanos-fe2002.codfw.wmnet downtime for PDU work T309957
  • 17:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32180 and previous config saved to /var/cache/conftool/dbconfig/20220802-170503-marostegui.json
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 17:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
  • 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 17:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 17:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32179 and previous config saved to /var/cache/conftool/dbconfig/20220802-170333-marostegui.json
  • 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-tls
  • 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=varnish-fe
  • 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
  • 17:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2030,2045,2052].codfw.wmnet
  • 17:00 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2030,2045,2052].codfw.wmnet
  • 16:57 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1004.eqiad.wmnet
  • 16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32178 and previous config saved to /var/cache/conftool/dbconfig/20220802-164827-marostegui.json
  • 16:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:35 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32177 and previous config saved to /var/cache/conftool/dbconfig/20220802-163321-marostegui.json
  • 16:29 dancy@mwmaint1002: pull aborted: (duration: 00m 07s)
  • 16:25 rzl: rzl@stat1007:~$ sudo systemctl stop wmde-analytics-daily-early # wedged, timer will restart it now with max_runtime_seconds
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32176 and previous config saved to /var/cache/conftool/dbconfig/20220802-161815-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32175 and previous config saved to /var/cache/conftool/dbconfig/20220802-161607-marostegui.json
  • 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32174 and previous config saved to /var/cache/conftool/dbconfig/20220802-161545-marostegui.json
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1004.eqiad.wmnet on all recursors
  • 16:10 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1004.eqiad.wmnet on all recursors
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 16:05 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1004.eqiad.wmnet
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32173 and previous config saved to /var/cache/conftool/dbconfig/20220802-160039-marostegui.json
  • 15:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: T309957
  • 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: T309957
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: T309957
  • 15:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: T309957
  • 15:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: T309957
  • 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: T309957
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32172 and previous config saved to /var/cache/conftool/dbconfig/20220802-154533-marostegui.json
  • 15:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2037.codfw.wmnet
  • 15:36 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32171 and previous config saved to /var/cache/conftool/dbconfig/20220802-153027-marostegui.json
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32170 and previous config saved to /var/cache/conftool/dbconfig/20220802-152818-marostegui.json
  • 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32169 and previous config saved to /var/cache/conftool/dbconfig/20220802-152740-marostegui.json
  • 15:24 moritzm: installing gnupg2 security updates
  • 15:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1004.eqiad.wmnet with OS buster
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32167 and previous config saved to /var/cache/conftool/dbconfig/20220802-151234-marostegui.json
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:08 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
  • 15:08 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
  • 15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: T309957
  • 15:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: T309957
  • 15:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
  • 15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
  • 14:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: T309957
  • 14:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: T309957
  • 14:58 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=(appservers|api)-ro,name=codfw
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32166 and previous config saved to /var/cache/conftool/dbconfig/20220802-145728-marostegui.json
  • 14:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2060.codfw.wmnet with OS bullseye
  • 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
  • 14:50 moritzm: uploaded gnupg2 2.1.18-8~deb9u4+wmf1 to stretch-wikimedia
  • 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32164 and previous config saved to /var/cache/conftool/dbconfig/20220802-144222-marostegui.json
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32163 and previous config saved to /var/cache/conftool/dbconfig/20220802-144013-marostegui.json
  • 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32162 and previous config saved to /var/cache/conftool/dbconfig/20220802-143952-marostegui.json
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetmaster1004.eqiad.wmnet with OS buster
  • 14:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
  • 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32161 and previous config saved to /var/cache/conftool/dbconfig/20220802-142446-marostegui.json
  • 14:23 Emperor: shutdown ms-be20[30,45,52] for PDU work T309957
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 14:21 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 14:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32160 and previous config saved to /var/cache/conftool/dbconfig/20220802-140940-marostegui.json
  • 14:05 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster2004.codfw.wmnet with OS buster
  • 14:04 godog: grow sda/sdb 3 by 100G on thanos-be1001 - T314275
  • 14:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
  • 14:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
  • 14:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
  • 14:01 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
  • 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-tls
  • 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2032.codfw.wmnet,service=ats-be
  • 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 13:56 godog: schedule poweroff for centrallog2002 at 16 utc - T310070
  • 13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-be
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32159 and previous config saved to /var/cache/conftool/dbconfig/20220802-135435-marostegui.json
  • 13:53 godog: depool and poweroff prometheus2005 - T310070
  • 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
  • 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
  • 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=varnish-fe
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32158 and previous config saved to /var/cache/conftool/dbconfig/20220802-135226-marostegui.json
  • 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
  • 13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32157 and previous config saved to /var/cache/conftool/dbconfig/20220802-135155-marostegui.json
  • 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
  • 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=varnish-fe
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=varnish-fe
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-tls
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=varnish-fe
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-be
  • 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
  • 13:42 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2013.codfw.wmnet with OS bullseye
  • 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable usage tracking for statement for cebwiki (T296384) – expected to gradually increase number of wbc_entity_usage and probably recentchanges rows on cebwiki, but not too much, see task for details (duration: 03m 06s)
  • 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2028.codfw.wmnet with OS bullseye
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32156 and previous config saved to /var/cache/conftool/dbconfig/20220802-133648-marostegui.json
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Introduce $wmgEntityUsageModifierLimitsStatement (T296384) (2/2) (duration: 03m 21s)
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Introduce $wmgEntityUsageModifierLimitsStatement (T296384) (1/2) (duration: 03m 16s)
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, T309957
  • 13:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, T309957
  • 13:27 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster2004.codfw.wmnet with OS buster
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
  • 13:24 vgutierrez: restarting ATS 9.x instances to apply https://gerrit.wikimedia.org/r/819585 - T309651
  • 13:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32155 and previous config saved to /var/cache/conftool/dbconfig/20220802-132142-marostegui.json
  • 13:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
  • 13:19 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a4499e5: Revert "testwiki: Add mediawiki.web_ui.interactions stream" (T314151, T311268) (duration: 03m 19s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c2fb8a5: Enable RealtimePreview on Group 0 wikis (T314150) (duration: 03m 21s)
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32154 and previous config saved to /var/cache/conftool/dbconfig/20220802-130636-marostegui.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32153 and previous config saved to /var/cache/conftool/dbconfig/20220802-130428-marostegui.json
  • 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32152 and previous config saved to /var/cache/conftool/dbconfig/20220802-130351-marostegui.json
  • 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2013.codfw.wmnet with OS bullseye
  • 13:00 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32151 and previous config saved to /var/cache/conftool/dbconfig/20220802-124845-marostegui.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32150 and previous config saved to /var/cache/conftool/dbconfig/20220802-123338-marostegui.json
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32149 and previous config saved to /var/cache/conftool/dbconfig/20220802-121832-marostegui.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32148 and previous config saved to /var/cache/conftool/dbconfig/20220802-121624-marostegui.json
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:01 marostegui: dbmaint x1@eqiad T314087
  • 11:57 marostegui: dbmaint s7@eqiad T314377
  • 11:57 marostegui: dbmaint s3@eqiad T314377
  • 11:57 marostegui: dbmaint s8@eqiad T314377
  • 11:55 marostegui: dbmait s8@eqiad T314377
  • 11:54 marostegui: dbmait s3@eqiad T314377
  • 11:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:48 marostegui: dbmait s7@eqiad T314377
  • 11:46 marostegui: dbmait s4@eqiad T314377
  • 11:35 elukey: restart rsyslog on ml-serve1006
  • 10:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: T312626 btullis
  • 10:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: T312626 btullis
  • 10:49 godog: grow sda3 by 100G on thanos-be2004 - T314275
  • 10:42 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 10:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P32147 and previous config saved to /var/cache/conftool/dbconfig/20220802-103318-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P32146 and previous config saved to /var/cache/conftool/dbconfig/20220802-101813-root.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2175 to s2 T311494', diff saved to https://phabricator.wikimedia.org/P32145 and previous config saved to /var/cache/conftool/dbconfig/20220802-101522-marostegui.json
  • 10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1019.eqiad.wmnet with OS bullseye
  • 10:05 jynus: shutdown dbprov2002 backup2005 backup2008 T310070
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P32144 and previous config saved to /var/cache/conftool/dbconfig/20220802-100308-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32143 and previous config saved to /var/cache/conftool/dbconfig/20220802-100304-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2079 from dbctl T313885', diff saved to https://phabricator.wikimedia.org/P32141 and previous config saved to /var/cache/conftool/dbconfig/20220802-095455-marostegui.json
  • 09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
  • 09:49 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
  • 09:49 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P32140 and previous config saved to /var/cache/conftool/dbconfig/20220802-094804-root.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32139 and previous config saved to /var/cache/conftool/dbconfig/20220802-094759-root.json
  • 09:44 godog: grow sdb3 by 100G on thanos-be2004 - T314275
  • 09:43 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 09:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1019.eqiad.wmnet with OS bullseye
  • 09:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P32138 and previous config saved to /var/cache/conftool/dbconfig/20220802-093259-root.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32137 and previous config saved to /var/cache/conftool/dbconfig/20220802-093254-root.json
  • 09:30 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 09:30 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:28 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 09:26 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 09:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 09:22 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P32136 and previous config saved to /var/cache/conftool/dbconfig/20220802-091754-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32135 and previous config saved to /var/cache/conftool/dbconfig/20220802-091749-root.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2143', diff saved to https://phabricator.wikimedia.org/P32134 and previous config saved to /var/cache/conftool/dbconfig/20220802-091518-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P32133 and previous config saved to /var/cache/conftool/dbconfig/20220802-090250-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32132 and previous config saved to /var/cache/conftool/dbconfig/20220802-090245-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P32131 and previous config saved to /var/cache/conftool/dbconfig/20220802-084745-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32130 and previous config saved to /var/cache/conftool/dbconfig/20220802-084740-root.json
  • 08:46 marostegui: stop mysql on db2095 db2107 db2109 db2137 db2147 db2159 db2160 pc2012 for pdu maintenance on codfw b5 T310070
  • 07:49 moritzm: upgrading drmrs ganeti clusters to 3.0.2 T312637
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:22 godog: bounce icinga on alert2001 - T314353
  • 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 07:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 06:58 elukey: restart rsyslog on ml-serve2006
  • 06:56 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: pruneRevData: Make cleaning in larger batches (T296380) (duration: 03m 26s)
  • 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:46 godog: bounce icinga on alert1001 - T314353
  • 05:48 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db2088.codfw.wmnet
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:44 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2088.codfw.wmnet
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P32127 and previous config saved to /var/cache/conftool/dbconfig/20220802-052923-root.json
  • 05:24 marostegui: dbmait x1@eqiad T314087
  • 04:17 ryankemper: [Elastic] Small amendment to my earlier statement; based off epoch time `be_x_oldwiki_titlesuggest_1659407912` was not an old index hanging around after a reindex operation, but rather the new one that the reindex operation was trying to create, but had not yet finished (therefore didn't switch over the aliases). It presumably got interrupted by the reimage of `elastic2059`.
  • 04:15 ryankemper: [Elastic] Blew away red index like so: `ryankemper@cumin1001:~$ curl -XDELETE https://search.svc.codfw.wmnet:9243/be_x_oldwiki_titlesuggest_1659407912`. Cluster is back to `green` status.
  • 04:07 ryankemper: [Elastic] Per `curl -s https://search.svc.codfw.wmnet:9243/_cat/aliases | grep -i be_x` I see `be_x_oldwiki_titlesuggest ` alias points to `be_x_oldwiki_titlesuggest_1658396688`. I think this means the red index is an old index from an in-progress reindex operation. I likely just need to delete `be_x_oldwiki_titlesuggest_1659407912` but doing some quick digging first
  • 04:04 ryankemper: [Elastic] Red cluster status in main codfw elasticsearch cluster (`https://search.svc.codfw.wmnet:9243`); culprit appears to be index `be_x_oldwiki_titlesuggest_1659407912`. Confusingly it has 2 replicas set so it's not clear to me how we got into this state starting from green (in the past we've gone into red status from indices that erroneously had 0 replicas in production)
  • 03:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:40 krinkle@deploy1002: Synchronized multiversion/: I0802db272695 (duration: 03m 10s)
  • 03:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:34 krinkle@deploy1002: Synchronized wmf-config/: I9b89c0ff5c2 (duration: 03m 32s)
  • 03:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:27 krinkle@deploy1002: Synchronized multiversion/: I6e97d39a3, Ib843ebced31 (duration: 03m 30s)
  • 03:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:22 krinkle@mwmaint1002: pull aborted: (duration: 00m 11s)
  • 03:21 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I39a2b86065 (duration: 03m 19s)
  • 03:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2059.codfw.wmnet with OS bullseye
  • 03:15 krinkle@deploy1002: Synchronized multiversion/: Ieaea60 (duration: 03m 03s)
  • 03:14 krinkle@mwmaint2002: pull aborted: (duration: 01m 36s)
  • 03:14 krinkle@mwmaint1002: pull aborted: (duration: 01m 31s)
  • 03:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
  • 02:54 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service` to clear `Query Service HTTP Port` && `WDQS SPARQL` alerts
  • 02:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
  • 02:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2059.codfw.wmnet with OS bullseye
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Ieaea60a991e5 (duration: 03m 10s)
  • 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:23 krinkle@deploy1002: Synchronized multiversion/: Ia3406e (duration: 03m 22s)
  • 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-01

  • 23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Id1ce285631f5, I194d419fbfe (duration: 03m 09s)
  • 23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:08 moritzm: drain ganeti2028 T309957
  • 21:03 mutante: gerrit2002 - mkdir /var/lib/gerrit2/review_site | gerrit1001 - rsyncing /var/lib/gerrit2/review_site/ to gerrit2002 T313250 T313972
  • 21:01 urbanecm: UTC late backport window done
  • 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 461e070: itwiki: Change robot policy on NS2 and NS3 (T314165) (duration: 03m 18s)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mutante: phab1001 - rsyncing repo data /srv/repos/ to phab2002 (in addition to phab1004 previously) T313360
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:55 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mnwwiktionary --fix # T314023
  • 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ba8c177: mnwwiktionary: Create Appendix namespace (T314023) (duration: 03m 09s)
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateArticleCount.php --wiki=viwikibooks --update # T314239
  • 20:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c19c3e36ab: DiscussionTools: Make new reply buttons available at mediawiki.org (T314076); 24db016c4: viwikibooks: Change wgArticleCountMethod to any (T314239) (duration: 03m 10s)
  • 20:35 daniel@deploy1002: Synchronized php-1.39.0-wmf.22/includes/Rest/Handler: Fix: Parsoid REST handler: allow pagebundle input without original HTML. (duration: 03m 15s)
  • 20:25 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ne.svg (T311700)
  • 20:21 daniel@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ne.svg: Config: newiki: Update wordmark (T311700) (duration: 03m 17s)
  • 20:17 daniel@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: newiki: Update wordmark (T311700) (duration: 03m 32s)
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2054.codfw.wmnet with OS bullseye
  • 19:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
  • 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
  • 19:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2054.codfw.wmnet with OS bullseye
  • 18:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2031.codfw.wmnet with OS bullseye
  • 18:44 mutante: gitlab - moved data_persistence group to new parent, under /repos/
  • 18:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
  • 18:32 mutante: gitlab - created group 'data_persistence' - added Ladsgroup and upgraded from member to maintainer
  • 18:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
  • 18:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2031.codfw.wmnet with OS bullseye
  • 17:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2025.codfw.wmnet with OS bullseye
  • 17:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
  • 17:31 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
  • 17:18 ryankemper: T289135 T314078 Manually reimaging remaining codfw stretch hosts (`elastic[2025,2031,2054,2059-2060]`) to bullseye, one host at a time, waiting for green cluster status to return between each run. `ryankemper@cumin1001` tmux session `codfw_reimage`
  • 17:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2025.codfw.wmnet with OS bullseye
  • 17:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 17:08 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 17:06 mutante: alert1001 - systemctl restart nsca - pinged by fundraising tech because fundraising hosts have the "passive check is awol" issue again (T196336)
  • 16:25 moritzm: installing tcpdump updates from bullseye point release
  • 16:23 cwhite@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
  • 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1018.eqiad.wmnet with OS bullseye
  • 16:10 cwhite@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
  • 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
  • 15:41 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1018.eqiad.wmnet with OS bullseye
  • 15:39 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:29 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001
  • 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Beta: add configuration for redirect badges (T313896) (2/2, should be a no-op) (duration: 03m 30s)
  • 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Beta: add configuration for redirect badges (T313896) (1/2, should be a no-op) (duration: 03m 15s)
  • 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:53 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:42 moritzm: installing openjdk-11 security updates
  • 14:39 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:38 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:34 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:28 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/skins/Vector/: b5007c5: Revert "styles: Unify on standard external link icon"" (duration: 03m 16s)
  • 14:12 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 14:12 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2044.codfw.wmnet with OS bullseye
  • 14:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: bcb7b0d: Adjust width-height ratio of logo to fix display issue (T310961; 2/2) (duration: 03m 17s)
  • 14:04 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/srwikisource{.png;-1.5x.png;-2x.png} (T310961)
  • 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:01 urbanecm@deploy1002: Synchronized static/images/project-logos/: bcb7b0d: srwikisource: Adjust width-height ratio of logo to fix display issue (T310961; 1/2) (duration: 03m 41s)
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 urbanecm: UTC afternoon backport window is going to overflow by a couple of minutes
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
  • 13:44 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
  • 13:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2044.codfw.wmnet with OS bullseye
  • 13:22 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 11:50 moritzm: installing openjdk-8 security updates for stretch
  • 11:43 moritzm: uploaded openjdk-8 8u342-b07-1~deb9u1 for stretch-wikimedia
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32124 and previous config saved to /var/cache/conftool/dbconfig/20220801-102714-ladsgroup.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32123 and previous config saved to /var/cache/conftool/dbconfig/20220801-101208-ladsgroup.json
  • 10:09 vgutierrez: test ATS 9.1.2 on cp6016 - T309651
  • 10:05 vgutierrez: test ATS 9.1.2 on cp6008 - T309651
  • 10:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@4da9195]: (no justification provided) (duration: 00m 19s)
  • 10:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@4da9195]: (no justification provided)
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32122 and previous config saved to /var/cache/conftool/dbconfig/20220801-095702-ladsgroup.json
  • 09:56 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 05s)
  • 09:56 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32121 and previous config saved to /var/cache/conftool/dbconfig/20220801-094156-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32120 and previous config saved to /var/cache/conftool/dbconfig/20220801-093845-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 09:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
  • 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
  • 09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
  • 09:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
  • 09:00 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
  • 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:53 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/includes/api: Backport: api: Support for links migration in ApiQueryBacklinks (T312865 T314112) (duration: 03m 01s)
  • 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:50 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
  • 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
  • 08:48 godog: thanos-be2004: copy quarantined and tmp off sdb3 and into sdb4 for analysis and to free space - T314275
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks columns in itwikisource (T312865) (duration: 03m 12s)
  • 08:43 vgutierrez: rolling upgrade of HAProxy to version 2.4.18
  • 08:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
  • 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 08:28 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 08:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
  • 08:14 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
  • 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
  • 06:14 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appservers-ro
  • 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appserver-ro
  • 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=(appserver|api)-ro
  • 05:43 moritzm: installing Linux 5.10.127-2 on Gitlab runners
  • 01:00 krinkle@deploy1002: Synchronized multiversion/: Ic0dbcb (duration: 03m 31s)
  • 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:45 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: I9d363abd7cfef (duration: 03m 17s)
  • 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

Archives

See Server Admin Log/Archives.