You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2003-dev.wikimedia.org with OS bullseye)
imported>Stashbot
(bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1011.eqiad.wmnet)
 
(182 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2022-01-29 ==
== 2022-08-09 ==
* 21:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2003-dev.wikimedia.org with OS bullseye
* 23:17 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1011.eqiad.wmnet
* 18:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2003-dev.wikimedia.org with OS bullseye
* 23:07 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 17:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org with OS bullseye
* 23:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.wikimedia.org with OS bullseye
* 22:51 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:53 hashar: contint1001 and contint2001 : pruning old reflog from Zuul merger git repositories: `sudo -u zuul find /srv/zuul/git -maxdepth 4 -type d -name .git -print -execdir git reflog expire --all --expire=now \;`
* 22:51 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 05:25 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2003-dev.wikimedia.org with OS bullseye
* 22:49 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 04:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2003-dev.wikimedia.org with OS bullseye
* 22:49 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 00:14 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1049 to address CirrusSearchJVMGCOldPoolFlatlined alert
* 22:46 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1015.eqiad.wmnet
* 22:31 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:31 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:02 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 22:02 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 21:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 21:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 21:53 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 21:52 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 21:50 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 21:49 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 21:43 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
* 21:43 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
* 21:43 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 21:43 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 21:43 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 21:43 bking@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 21:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:00 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
* 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32332 and previous config saved to /var/cache/conftool/dbconfig/20220809-205548-ladsgroup.json
* 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1014.eqiad.wmnet
* 20:51 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1014.eqiad.wmnet
* 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32331 and previous config saved to /var/cache/conftool/dbconfig/20220809-204042-ladsgroup.json
* 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32330 and previous config saved to /var/cache/conftool/dbconfig/20220809-202536-ladsgroup.json
* 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32329 and previous config saved to /var/cache/conftool/dbconfig/20220809-201030-ladsgroup.json
* 19:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 19:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 19:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 19:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 19:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 19:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 19:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 19:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1072.eqiad.wmnet with OS bullseye
* 17:29 vgutierrez: test trafficserver 9.1.2-1wm2 in cp6016 - [[phab:T309651|T309651]]
* 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
* 17:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
* 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1072.eqiad.wmnet with OS bullseye
* 16:54 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 16:53 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 16:53 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 16:26 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 16:26 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1069.eqiad.wmnet with OS bullseye
* 15:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
* 15:42 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1069.eqiad.wmnet with OS bullseye
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1058.eqiad.wmnet with OS bullseye
* 15:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
* 15:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* m: finished running 'homer "status:active" commit "netmon: Add the netmon1003 host as a syslog destination"' in the cumin1001 host. Homer reported no errors.
* 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1058.eqiad.wmnet with OS bullseye
* 14:28 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
* 13:57 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:57 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* m: Add the new netmon1003 host as a syslog destination in homer templates/common/system.conf https://gerrit.wikimedia.org/r/c/operations/homer/public/+/819124
* m: Successfully ran '# run-puppet-merge' in the netmon1002 and netmon1003 hosts.
* m: Running '# run-puppet-agent' in the netmon1003 host
* m: Running '# run-puppet-agent' in the netmon1002 host
* 13:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* m: puppet-merge on puppetmaster2004.codfw.wmnet for patch 819179 succeeded
* m: Set netmon1003 as netmon_server and netmon1002 as a netmon_servers_failover in the Puppet repository https://gerrit.wikimedia.org/r/c/operations/puppet/+/819179
* m: authdns updated successfully
* m: Had to revert https://gerrit.wikimedia.org/r/c/operations/dns/+/819177 because I rebased my changes incorrectly, sent the new patch in https://gerrit.wikimedia.org/r/c/operations/dns/+/821746
* m: running '# authdns-update' in  ns0.wikimedia.org
* m: Flip DNS for LibreNMS and Smokeping from netmon1002 to netmon1003 https://gerrit.wikimedia.org/r/c/operations/dns/+/819177
* 13:23 jynus: stop replication on db1117:m1 [[phab:T309074|T309074]]
* m: netmon1002 to netmon1003 failover
* 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:16 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:53 vgutierrez: rolling restart of pybal in eqsin - [[phab:T310070|T310070]]
* 09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 09:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 09:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 09:12 vgutierrez: rolling restart of pybal in codfw - [[phab:T310070|T310070]]
* 08:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:24 jynus: starting data check using es1021 and es2021, expect increased read traffic [[phab:T314559|T314559]]
* 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 06:19 Amir1: dbmaint s5@eqiad ([[phab:T312863|T312863]] [[phab:T312984|T312984]] [[phab:T310011|T310011]] [[phab:T310485|T310485]])
* 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
* 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
* 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32323 and previous config saved to /var/cache/conftool/dbconfig/20220809-060836-ladsgroup.json
* 06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32322 and previous config saved to /var/cache/conftool/dbconfig/20220809-060159-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32321 and previous config saved to /var/cache/conftool/dbconfig/20220809-060105-ladsgroup.json
* 06:00 Amir1: Starting s5 eqiad failover from db1130 to db1100 - [[phab:T314370|T314370]]
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32320 and previous config saved to /var/cache/conftool/dbconfig/20220809-051251-ladsgroup.json
* 05:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 [[phab:T314370|T314370]]
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 22 hosts with reason: Primary switchover s5 [[phab:T314370|T314370]]
* 02:42 ejegg: SmashPig upgraded from {{Gerrit|9b97ea15}} to {{Gerrit|13e9e9cc}}
* 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32318 and previous config saved to /var/cache/conftool/dbconfig/20220809-023113-ladsgroup.json
* 02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 02:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32317 and previous config saved to /var/cache/conftool/dbconfig/20220809-023052-ladsgroup.json
* 02:28 ejegg: payments-wiki upgraded from {{Gerrit|6880236d}} to {{Gerrit|cf5e1848}}
* 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32316 and previous config saved to /var/cache/conftool/dbconfig/20220809-021546-ladsgroup.json
* 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32315 and previous config saved to /var/cache/conftool/dbconfig/20220809-020040-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32314 and previous config saved to /var/cache/conftool/dbconfig/20220809-014534-ladsgroup.json


== 2022-01-28 ==
== 2022-08-08 ==
* 21:52 mutante: purging font packages from mwdebug* and scandium*  [[phab:T294378|T294378]]
* 23:52 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 19s)
* 21:47 mutante: purging font packages from remaining appservers in codfw mw23* ranges.. [[phab:T294378|T294378]]
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 23:46 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 27s)
* 20:10 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1024.eqiad.wmnet with OS buster
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1023.eqiad.wmnet
* 23:32 eileen___: config revision changed from {{Gerrit|f5668044}} to 787cd0e0<eileen___> eileen
* 17:17 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS buster
* 23:32 eileen___: civicrm upgraded from {{Gerrit|497bddf7}} to {{Gerrit|1f91ac2d}}
* 17:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1023.eqiad.wmnet with OS buster
* 22:16 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 16:41 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1023.eqiad.wmnet with OS buster
* 22:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic1065.eqiad.wmnet with OS bullseye
* 16:41 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1022.eqiad.wmnet
* 21:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
* 16:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1022.eqiad.wmnet with OS buster
* 21:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
* 15:50 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 21:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1065.eqiad.wmnet with OS bullseye
* 15:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1062.eqiad.wmnet with OS bullseye
* 15:47 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1022.eqiad.wmnet with OS buster
* 20:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
* 15:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1021.eqiad.wmnet
* 20:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
* 15:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1021.eqiad.wmnet with OS buster
* 20:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1062.eqiad.wmnet with OS bullseye
* 15:41 vgutierrez: pool cp4031 using envoy as TLS termination layer - [[phab:T271421|T271421]]
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:14 Amir1: start of cleaning lint errors caused by content model changes ([[phab:T298343|T298343]])
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS buster
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1020.eqiad.wmnet
* 20:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 14:47 vgutierrez: update varnish to version 6.0.10-1wm1 on cp4036 - [[phab:T300264|T300264]]
* 20:28 cjming: end of UTC late backport window
* 14:47 Amir1: optimizing dewiki.flaggedtemplates in db2113
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:27 vgutierrez: update varnish to version 6.0.10-1wm1 on cp4034 - [[phab:T300264|T300264]]
* 20:27 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.styles/layouts/grid.less: Backport: [[gerrit:821243{{!}}Fix grid blowout bug (T314756)]] (duration: 03m 26s)
* 13:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1020.eqiad.wmnet with OS buster
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:01 moritzm: installing uriparser security updates
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19562 and previous config saved to /var/cache/conftool/dbconfig/20220128-123210-root.json
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:30 moritzm: installing libseccomp bugfix updates from bullseye point release
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817785{{!}}Disable sticky header edit A/B test for pilot wikis (T312296)]] (duration: 03m 35s)
* 12:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS buster
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:20 vgutierrez: upload varnish 6.0.10-1wm1 to apt.wm.o (buster component/varnish6) - [[phab:T300264|T300264]]
* 17:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1088.eqiad.wmnet with OS bullseye
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19561 and previous config saved to /var/cache/conftool/dbconfig/20220128-121706-root.json
* 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19560 and previous config saved to /var/cache/conftool/dbconfig/20220128-120201-root.json
* 17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19559 and previous config saved to /var/cache/conftool/dbconfig/20220128-114658-root.json
* 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS bullseye
* 11:35 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 16:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1085.eqiad.wmnet with OS bullseye
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19558 and previous config saved to /var/cache/conftool/dbconfig/20220128-113154-root.json
* 16:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19557 and previous config saved to /var/cache/conftool/dbconfig/20220128-111650-root.json
* 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19556 and previous config saved to /var/cache/conftool/dbconfig/20220128-110147-root.json
* 16:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19555 and previous config saved to /var/cache/conftool/dbconfig/20220128-104643-root.json
* 16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19554 and previous config saved to /var/cache/conftool/dbconfig/20220128-103140-root.json
* 16:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 10:29 mdipietro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.wikimedia.org with OS bullseye
* 16:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
* 10:25 moritzm: draining ganeti1010 for eventual reimage
* 16:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic1085.eqiad.wmnet with OS bullseye
* 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1019.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 16:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 10:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1019.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 16:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1168.eqiad.wmnet with OS bullseye
* 16:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19553 and previous config saved to /var/cache/conftool/dbconfig/20220128-101636-root.json
* 16:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1168.eqiad.wmnet with OS bullseye
* 16:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P19552 and previous config saved to /var/cache/conftool/dbconfig/20220128-094636-marostegui.json
* 16:10 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 09:46 moritzm: installing brltty bugfix updates from bullseye point release
* 16:09 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19551 and previous config saved to /var/cache/conftool/dbconfig/20220128-094430-root.json
* 16:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19550 and previous config saved to /var/cache/conftool/dbconfig/20220128-094422-root.json
* 16:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1084.eqiad.wmnet with OS bullseye
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19549 and previous config saved to /var/cache/conftool/dbconfig/20220128-092927-root.json
* 15:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19548 and previous config saved to /var/cache/conftool/dbconfig/20220128-092918-root.json
* 15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
* 09:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:46 sukhe: upload reprepro -C main include bullseye-wikimedia python-pynetbox_6.6.0-1+wmf11u1_amd64.changes
* 09:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
* 09:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
* 09:23 hashar@deploy1002: Synchronized wmf-config/CommonSettings.php: GrowthExperiments: Disable mobile quality gate - [[phab:T298122|T298122]] [[phab:T300336|T300336]] (duration: 00m 50s)
* 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
* 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1084.eqiad.wmnet with OS bullseye
* 09:17 godog: pool prometheus2005 and depool prometheus2003 - [[phab:T296199|T296199]]
* 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19547 and previous config saved to /var/cache/conftool/dbconfig/20220128-091423-root.json
* 14:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19546 and previous config saved to /var/cache/conftool/dbconfig/20220128-091415-root.json
* 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19545 and previous config saved to /var/cache/conftool/dbconfig/20220128-085919-root.json
* 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19544 and previous config saved to /var/cache/conftool/dbconfig/20220128-085911-root.json
* 14:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
* 14:11 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19543 and previous config saved to /var/cache/conftool/dbconfig/20220128-084416-root.json
* 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19542 and previous config saved to /var/cache/conftool/dbconfig/20220128-084407-root.json
* 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
* 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
* 12:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
* 12:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|77fd5abdd7d9462869259e1511bbcf2d7ce62246}}: Growth: Add new rights to wgAvailableRights (duration: 03m 24s)
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19541 and previous config saved to /var/cache/conftool/dbconfig/20220128-082912-root.json
* 12:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19540 and previous config saved to /var/cache/conftool/dbconfig/20220128-082904-root.json
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19539 and previous config saved to /var/cache/conftool/dbconfig/20220128-081408-root.json
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19538 and previous config saved to /var/cache/conftool/dbconfig/20220128-081400-root.json
* 12:06 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/: {{Gerrit|3eaf155678b7313c55dcca0cd39ab29f73eead37}}: MentorTools: Do not use MentorWeightManager ([[phab:T314362|T314362]]) (duration: 03m 31s)
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19537 and previous config saved to /var/cache/conftool/dbconfig/20220128-075905-root.json
* 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19536 and previous config saved to /var/cache/conftool/dbconfig/20220128-075856-root.json
* 11:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19535 and previous config saved to /var/cache/conftool/dbconfig/20220128-074401-root.json
* 11:21 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2022.codfw.wmnet
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19534 and previous config saved to /var/cache/conftool/dbconfig/20220128-074353-root.json
* 11:21 jelto: kubectl uncordon kubernetes2022.codfw.wmnet
* 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1096.eqiad.wmnet with OS bullseye
* 10:43 Amir1: Removing db2079 from orchestrator ([[phab:T313885|T313885]])
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19533 and previous config saved to /var/cache/conftool/dbconfig/20220128-072858-root.json
* 10:39 Amir1: Removing db2079 from zarcillo ([[phab:T313885|T313885]])
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19532 and previous config saved to /var/cache/conftool/dbconfig/20220128-072849-root.json
* 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2079.codfw.wmnet
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1096.eqiad.wmnet with OS bullseye
* 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6) [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P19531 and previous config saved to /var/cache/conftool/dbconfig/20220128-070112-marostegui.json
* 10:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
* 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2133.codfw.wmnet with OS bullseye
* 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2079.codfw.wmnet
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2133.codfw.wmnet with OS bullseye
* 10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
* 04:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1068.eqiad.wmnet with OS stretch
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
* 04:34 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1068.eqiad.wmnet with OS stretch
* 08:41 jbond: deploy libtirpc update
* 04:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host elastic1068.eqiad.wmnet with OS stretch
* 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32310 and previous config saved to /var/cache/conftool/dbconfig/20220808-075723-ladsgroup.json
* 04:33 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1068.eqiad.wmnet with OS stretch
* 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 01:47 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2001-dev.wikimedia.org with OS bullseye
* 07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32309 and previous config saved to /var/cache/conftool/dbconfig/20220808-075702-ladsgroup.json
* 07:53 godog: grow sda/sdb 3 by 100G on thanos-be2001 - [[phab:T314275|T314275]]
* 07:50 godog: grow sda/sdb 3 by 100G on thanos-be1004 - [[phab:T314275|T314275]]
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32308 and previous config saved to /var/cache/conftool/dbconfig/20220808-074156-ladsgroup.json
* 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32307 and previous config saved to /var/cache/conftool/dbconfig/20220808-072650-ladsgroup.json
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820815{{!}}trwikivoyage: Create rollbacker user group (T314678)]] (duration: 03m 17s)
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:11 elukey: restart rsyslog on ml-serve2007
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32306 and previous config saved to /var/cache/conftool/dbconfig/20220808-071144-ladsgroup.json
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820261{{!}}Enable SectionTranslation on 10 Wikipedias where ContentTranslation is default (T308829)]] (duration: 03m 15s)
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:06 XioNoX: add CSP headers to Netbox - [[phab:T296356|T296356]]
* 07:05 elukey: restart rsyslog on ml-serve-ctrl2001


== 2022-01-27 ==
== 2022-08-07 ==
* 23:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS bullseye
* 19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" {{!}} mwscript purgeList.php --wiki enwiki # [[phab:T314712|T314712]]
* 23:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.wikimedia.org with OS buster
* 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
* 23:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS buster
* 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 23:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.wikimedia.org with OS buster
* 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 22:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS buster
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
* 21:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
* 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19530 and previous config saved to /var/cache/conftool/dbconfig/20220127-205155-marostegui.json
* 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
* 20:49 cstone: updated  civicrm revision changed from {{Gerrit|6f1eddce}} to {{Gerrit|0513f1b7}}
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19529 and previous config saved to /var/cache/conftool/dbconfig/20220127-203650-marostegui.json
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19528 and previous config saved to /var/cache/conftool/dbconfig/20220127-202145-marostegui.json
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
* 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19527 and previous config saved to /var/cache/conftool/dbconfig/20220127-200641-marostegui.json
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json
* 20:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]]
* 20:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19526 and previous config saved to /var/cache/conftool/dbconfig/20220127-200535-marostegui.json
* 20:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 20:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 20:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19525 and previous config saved to /var/cache/conftool/dbconfig/20220127-200523-marostegui.json
* 20:03 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): no current blockers; logs clean-ish, rolling train forward to group2
* 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:01 urbanecm@deploy1002: Synchronized phpcs.xml: {{Gerrit|11498603a918863c08300b4abfc69491424ebe14}}: Remove trusted-xff.php from wmf-config ([[phab:T298243|T298243]]; 3/3) (duration: 00m 50s)
* 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:59 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|11498603a918863c08300b4abfc69491424ebe14}}: Remove trusted-xff.php from wmf-config ([[phab:T298243|T298243]]; 2/3) (duration: 00m 51s)
* 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:58 urbanecm@deploy1002: Synchronized docroot/noc/: {{Gerrit|11498603a918863c08300b4abfc69491424ebe14}}: Remove trusted-xff.php from wmf-config ([[phab:T298243|T298243]]; 1/3) (duration: 00m 50s)
* 19:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:52 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|6fa62c58c04929d7327d8f07dbd32b6139f58ccf}}: Do not set wgTrustedXffFile ([[phab:T298243|T298243]]) (duration: 00m 51s)
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19524 and previous config saved to /var/cache/conftool/dbconfig/20220127-195019-marostegui.json
* 19:43 mutante: purging font packages from parse* (parsoid codfw)
* 19:42 mutante: purging font packages from wtp* (parsoid eqiad)
* 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2c8561c1c0aa6b4f5f8202972b7b28723337e88e}}: Launch DiscussionTools new topic tool a/b test ([[phab:T291308|T291308]]) (duration: 00m 51s)
* 19:36 mutante: purging font* / xfont* packages from further eqiad appservers (mw14*) for [[phab:T294378|T294378]]
* 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19521 and previous config saved to /var/cache/conftool/dbconfig/20220127-193514-marostegui.json
* 19:27 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752657{{!}}GrowthExperiments: Start add image experiment for desktop users (T298122)]] (duration: 00m 51s)
* 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19520 and previous config saved to /var/cache/conftool/dbconfig/20220127-192009-marostegui.json
* 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19519 and previous config saved to /var/cache/conftool/dbconfig/20220127-191902-marostegui.json
* 19:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 19:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19518 and previous config saved to /var/cache/conftool/dbconfig/20220127-191854-marostegui.json
* 19:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19517 and previous config saved to /var/cache/conftool/dbconfig/20220127-191141-marostegui.json
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19516 and previous config saved to /var/cache/conftool/dbconfig/20220127-190349-marostegui.json
* 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19515 and previous config saved to /var/cache/conftool/dbconfig/20220127-185637-marostegui.json
* 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19514 and previous config saved to /var/cache/conftool/dbconfig/20220127-184845-marostegui.json
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:45 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync on production
* 18:43 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync on canary
* 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:43 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply on canary
* 18:43 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply on production
* 18:42 brennen@deploy1002: Synchronized php-1.38.0-wmf.19/extensions/WikibaseMediaInfo: Backport: [[gerrit:757485{{!}}Revert "Escape various messages in WikibaseMediaInfo" (T299289)]] (duration: 00m 52s)
* 18:41 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[5-7].eqiad.wmnet
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19513 and previous config saved to /var/cache/conftool/dbconfig/20220127-184132-marostegui.json
* 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19512 and previous config saved to /var/cache/conftool/dbconfig/20220127-183340-marostegui.json
* 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19511 and previous config saved to /var/cache/conftool/dbconfig/20220127-183234-marostegui.json
* 18:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 18:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19510 and previous config saved to /var/cache/conftool/dbconfig/20220127-183226-marostegui.json
* 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19509 and previous config saved to /var/cache/conftool/dbconfig/20220127-182627-marostegui.json
* 18:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync on canary
* 18:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply on production
* 18:25 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on canary
* 18:25 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on production
* 18:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase102[5-7].eqiad.wmnet
* 18:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase[1025-1027].eqiad.wmnet with reason: Firmware upgrade
* 18:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase[1025-1027].eqiad.wmnet with reason: Firmware upgrade
* 18:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1024.eqiad.wmnet
* 18:20 mdipietro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS bullseye
* 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19508 and previous config saved to /var/cache/conftool/dbconfig/20220127-181722-marostegui.json
* 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19507 and previous config saved to /var/cache/conftool/dbconfig/20220127-181656-marostegui.json
* 18:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 18:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 18:07 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1024.eqiad.wmnet
* 18:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase1024.eqiad.wmnet with reason: Firmware upgrade
* 18:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase1024.eqiad.wmnet with reason: Firmware upgrade
* 18:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1023.eqiad.wmnet
* 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19506 and previous config saved to /var/cache/conftool/dbconfig/20220127-180217-marostegui.json
* 17:53 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1023.eqiad.wmnet
* 17:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase1023.eqiad.wmnet with reason: Firmware upgrade
* 17:52 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase1023.eqiad.wmnet with reason: Firmware upgrade
* 17:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1022.eqiad.wmnet
* 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19505 and previous config saved to /var/cache/conftool/dbconfig/20220127-174712-marostegui.json
* 17:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19504 and previous config saved to /var/cache/conftool/dbconfig/20220127-174606-marostegui.json
* 17:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 17:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19503 and previous config saved to /var/cache/conftool/dbconfig/20220127-174527-marostegui.json
* 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1022.eqiad.wmnet
* 17:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase1022.eqiad.wmnet with reason: Firmware upgrade
* 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase1022.eqiad.wmnet with reason: Firmware upgrade
* 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1021.eqiad.wmnet
* 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19502 and previous config saved to /var/cache/conftool/dbconfig/20220127-173022-marostegui.json
* 17:22 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:21 cmjohnson1: updating firmware restbase1021 [[phab:T299652|T299652]]
* 17:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1021.eqiad.wmnet
* 17:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on restbase1021.eqiad.wmnet with reason: Firmware upgrade
* 17:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on restbase1021.eqiad.wmnet with reason: Firmware upgrade
* 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19501 and previous config saved to /var/cache/conftool/dbconfig/20220127-171518-marostegui.json
* 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1020.eqiad.wmnet
* 17:01 cmjohnson1: updating firmware restbase1020 [[phab:T299652|T299652]]
* 17:00 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1020.eqiad.wmnet
* 17:00 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync on canary
* 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19500 and previous config saved to /var/cache/conftool/dbconfig/20220127-170013-marostegui.json
* 17:00 cmjohnson1: updating firmware ganeti1007 and ganeti1015 [[phab:T299527|T299527]]
* 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on restbase1020.eqiad.wmnet with reason: Firmware upgrade
* 16:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on restbase1020.eqiad.wmnet with reason: Firmware upgrade
* 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19499 and previous config saved to /var/cache/conftool/dbconfig/20220127-165907-marostegui.json
* 16:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 16:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19498 and previous config saved to /var/cache/conftool/dbconfig/20220127-165859-marostegui.json
* 16:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync on production
* 16:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on canary
* 16:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on production
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19497 and previous config saved to /var/cache/conftool/dbconfig/20220127-164354-marostegui.json
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19496 and previous config saved to /var/cache/conftool/dbconfig/20220127-162849-marostegui.json
* 16:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync on production
* 16:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply on canary
* 16:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply on production
* 16:24 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync on main
* 16:24 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 16:23 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply on canary
* 16:23 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply on main
* 16:22 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync on main
* 16:21 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply on canary
* 16:21 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply on main
* 16:20 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 16:20 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync on main
* 16:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 16:19 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply on canary
* 16:19 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply on main
* 16:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on production
* 16:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on canary
* 16:14 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply on production
* 16:14 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply on canary
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19495 and previous config saved to /var/cache/conftool/dbconfig/20220127-161344-marostegui.json
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19494 and previous config saved to /var/cache/conftool/dbconfig/20220127-161239-marostegui.json
* 16:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19493 and previous config saved to /var/cache/conftool/dbconfig/20220127-161231-marostegui.json
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19492 and previous config saved to /var/cache/conftool/dbconfig/20220127-160749-marostegui.json
* 16:03 dcausse: restarting blazegraph on wdqs1005 (jvm stuck for 2hours)
* 16:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19491 and previous config saved to /var/cache/conftool/dbconfig/20220127-155726-marostegui.json
* 15:57 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]] (duration: 00m 51s)
* 15:56 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]]
* 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on canary
* 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on production
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19490 and previous config saved to /var/cache/conftool/dbconfig/20220127-155244-marostegui.json
* 15:52 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply on production
* 15:52 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply on canary
* 15:52 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): no current blockers; rolling train forward to group1 before log triage meeting
* 15:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on production
* 15:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on canary
* 15:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on canary
* 15:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on production
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19489 and previous config saved to /var/cache/conftool/dbconfig/20220127-154222-marostegui.json
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19488 and previous config saved to /var/cache/conftool/dbconfig/20220127-153739-marostegui.json
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19487 and previous config saved to /var/cache/conftool/dbconfig/20220127-152717-marostegui.json
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19486 and previous config saved to /var/cache/conftool/dbconfig/20220127-152235-marostegui.json
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19485 and previous config saved to /var/cache/conftool/dbconfig/20220127-151709-marostegui.json
* 15:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 15:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19484 and previous config saved to /var/cache/conftool/dbconfig/20220127-151701-marostegui.json
* 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19483 and previous config saved to /var/cache/conftool/dbconfig/20220127-151032-ladsgroup.json
* 15:09 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on production
* 15:08 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on canary
* 15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply on production
* 15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply on canary
* 15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on production
* 15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on canary
* 15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply on canary
* 15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply on production
* 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19482 and previous config saved to /var/cache/conftool/dbconfig/20220127-150156-marostegui.json
* 14:59 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh6002.wikimedia.org
* 14:58 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync on production
* 14:57 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply on canary
* 14:57 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply on production
* 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P19481 and previous config saved to /var/cache/conftool/dbconfig/20220127-145527-ladsgroup.json
* 14:54 ottomata: continuing deployments of eventgate-main and eventgate-analytics to pick up CA cert changes - [[phab:T296064|T296064]] (also deploying eventgate-main for a schema repo bump for search)
* 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19480 and previous config saved to /var/cache/conftool/dbconfig/20220127-144652-marostegui.json
* 14:46 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh6002.wikimedia.org
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1028.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P19479 and previous config saved to /var/cache/conftool/dbconfig/20220127-144022-ladsgroup.json
* 14:39 moritzm: added ganeti1028 to Ganeti eqiad cluster [[phab:T293909|T293909]]
* 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19478 and previous config saved to /var/cache/conftool/dbconfig/20220127-143147-marostegui.json
* 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19477 and previous config saved to /var/cache/conftool/dbconfig/20220127-142841-marostegui.json
* 14:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 14:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 14:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 14:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19476 and previous config saved to /var/cache/conftool/dbconfig/20220127-142829-marostegui.json
* 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1028.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19475 and previous config saved to /var/cache/conftool/dbconfig/20220127-142517-ladsgroup.json
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19474 and previous config saved to /var/cache/conftool/dbconfig/20220127-142214-marostegui.json
* 14:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 14:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19473 and previous config saved to /var/cache/conftool/dbconfig/20220127-142206-marostegui.json
* 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1023.eqiad.wmnet with OS bullseye
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19471 and previous config saved to /var/cache/conftool/dbconfig/20220127-141324-marostegui.json
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19470 and previous config saved to /var/cache/conftool/dbconfig/20220127-140702-marostegui.json
* 14:05 moritzm: installing apache security updates
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19469 and previous config saved to /var/cache/conftool/dbconfig/20220127-135820-marostegui.json
* 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
* 13:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
* 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19468 and previous config saved to /var/cache/conftool/dbconfig/20220127-135157-marostegui.json
* 13:46 moritzm: imported elasticsearch-oss/kibana-oss/logstash-oss 6.8.23 to thirdparty/elastic68 for stretch and bullseye
* 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1023.eqiad.wmnet with OS bullseye
* 13:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
* 13:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1023.mgmt.eqiad.wmnet with reboot policy GRACEFUL
* 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19467 and previous config saved to /var/cache/conftool/dbconfig/20220127-134315-marostegui.json
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19466 and previous config saved to /var/cache/conftool/dbconfig/20220127-134209-marostegui.json
* 13:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 13:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 13:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19465 and previous config saved to /var/cache/conftool/dbconfig/20220127-134158-marostegui.json
* 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19464 and previous config saved to /var/cache/conftool/dbconfig/20220127-133715-root.json
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19463 and previous config saved to /var/cache/conftool/dbconfig/20220127-133652-marostegui.json
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19462 and previous config saved to /var/cache/conftool/dbconfig/20220127-133631-root.json
* 13:32 marostegui@cumin1001: START - Cookbook sre.hosts.provision for host es1023.mgmt.eqiad.wmnet with reboot policy GRACEFUL
* 13:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19461 and previous config saved to /var/cache/conftool/dbconfig/20220127-132653-marostegui.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19460 and previous config saved to /var/cache/conftool/dbconfig/20220127-132624-marostegui.json
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19459 and previous config saved to /var/cache/conftool/dbconfig/20220127-132212-root.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19458 and previous config saved to /var/cache/conftool/dbconfig/20220127-132128-root.json
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19457 and previous config saved to /var/cache/conftool/dbconfig/20220127-131148-marostegui.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19456 and previous config saved to /var/cache/conftool/dbconfig/20220127-130708-root.json
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19455 and previous config saved to /var/cache/conftool/dbconfig/20220127-130624-root.json
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19454 and previous config saved to /var/cache/conftool/dbconfig/20220127-125644-marostegui.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19453 and previous config saved to /var/cache/conftool/dbconfig/20220127-125538-marostegui.json
* 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19452 and previous config saved to /var/cache/conftool/dbconfig/20220127-125205-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19451 and previous config saved to /var/cache/conftool/dbconfig/20220127-125120-root.json
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19450 and previous config saved to /var/cache/conftool/dbconfig/20220127-123701-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19449 and previous config saved to /var/cache/conftool/dbconfig/20220127-123617-root.json
* 12:26 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts restbase2011.codfw.wmnet
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19448 and previous config saved to /var/cache/conftool/dbconfig/20220127-122558-marostegui.json
* 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2011.codfw.wmnet
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19447 and previous config saved to /var/cache/conftool/dbconfig/20220127-122157-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19446 and previous config saved to /var/cache/conftool/dbconfig/20220127-122113-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19445 and previous config saved to /var/cache/conftool/dbconfig/20220127-121053-marostegui.json
* 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4031.ulsfo.wmnet with OS buster
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1023 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19444 and previous config saved to /var/cache/conftool/dbconfig/20220127-120648-ladsgroup.json
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
* 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19443 and previous config saved to /var/cache/conftool/dbconfig/20220127-120608-root.json
* 12:01 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19442 and previous config saved to /var/cache/conftool/dbconfig/20220127-115548-marostegui.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19441 and previous config saved to /var/cache/conftool/dbconfig/20220127-115105-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19440 and previous config saved to /var/cache/conftool/dbconfig/20220127-114044-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19439 and previous config saved to /var/cache/conftool/dbconfig/20220127-113931-marostegui.json
* 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19438 and previous config saved to /var/cache/conftool/dbconfig/20220127-113924-marostegui.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19437 and previous config saved to /var/cache/conftool/dbconfig/20220127-113600-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19436 and previous config saved to /var/cache/conftool/dbconfig/20220127-113140-marostegui.json
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19435 and previous config saved to /var/cache/conftool/dbconfig/20220127-113132-marostegui.json
* 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4031.ulsfo.wmnet with OS buster
* 11:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2023.codfw.wmnet with OS bullseye
* 11:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1165.eqiad.wmnet with OS bullseye
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19434 and previous config saved to /var/cache/conftool/dbconfig/20220127-112418-marostegui.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19433 and previous config saved to /var/cache/conftool/dbconfig/20220127-112057-root.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P19432 and previous config saved to /var/cache/conftool/dbconfig/20220127-111628-marostegui.json
* 11:12 vgutierrez: depool cp4031 to be reimaged as cache::text_envoy - [[phab:T271421|T271421]]
* 11:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1159.eqiad.wmnet with OS bullseye
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19431 and previous config saved to /var/cache/conftool/dbconfig/20220127-110913-marostegui.json
* 11:07 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh6001.wikimedia.org
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P19429 and previous config saved to /var/cache/conftool/dbconfig/20220127-110123-marostegui.json
* 10:56 sukhe@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh6001.wikimedia.org
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1165.eqiad.wmnet with OS bullseye
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19428 and previous config saved to /var/cache/conftool/dbconfig/20220127-105408-marostegui.json
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P19427 and previous config saved to /var/cache/conftool/dbconfig/20220127-105223-marostegui.json
* 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2023.codfw.wmnet with OS bullseye
* 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
* 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
* 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2024-2025].codfw.wmnet with reason: Reimage of the master [[phab:T300006|T300006]]
* 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2024-2025].codfw.wmnet with reason: Reimage of the master [[phab:T300006|T300006]]
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19426 and previous config saved to /var/cache/conftool/dbconfig/20220127-104654-marostegui.json
* 10:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19425 and previous config saved to /var/cache/conftool/dbconfig/20220127-104641-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19424 and previous config saved to /var/cache/conftool/dbconfig/20220127-104618-marostegui.json
* 10:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1159.eqiad.wmnet with OS bullseye
* 10:35 Amir1: creating linktarget table everywhere ([[phab:T299416|T299416]])
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19423 and previous config saved to /var/cache/conftool/dbconfig/20220127-103136-marostegui.json
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19422 and previous config saved to /var/cache/conftool/dbconfig/20220127-102049-marostegui.json
* 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:17 jynus: Started Bacula Director Daemon service at backup1001 [[phab:T299624|T299624]]
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19421 and previous config saved to /var/cache/conftool/dbconfig/20220127-101631-marostegui.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19420 and previous config saved to /var/cache/conftool/dbconfig/20220127-100802-root.json
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19419 and previous config saved to /var/cache/conftool/dbconfig/20220127-100155-marostegui.json
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19418 and previous config saved to /var/cache/conftool/dbconfig/20220127-100127-marostegui.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19417 and previous config saved to /var/cache/conftool/dbconfig/20220127-100014-marostegui.json
* 10:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 10:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19416 and previous config saved to /var/cache/conftool/dbconfig/20220127-100007-marostegui.json
* 10:00 marostegui: Failover m1 from db1159 to db1128 - [[phab:T299624|T299624]]
* 09:57 jynus: Stopped Bacula Director Daemon service at backup1001 [[phab:T299624|T299624]]
* 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1027.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 09:53 moritzm: added ganeti1027 to Ganeti eqiad cluster [[phab:T293909|T293909]]
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19415 and previous config saved to /var/cache/conftool/dbconfig/20220127-095258-root.json
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1027.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 09:50 hnowlan@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 14s)
* 09:50 hnowlan@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
* 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19414 and previous config saved to /var/cache/conftool/dbconfig/20220127-094651-marostegui.json
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19413 and previous config saved to /var/cache/conftool/dbconfig/20220127-094502-marostegui.json
* 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19412 and previous config saved to /var/cache/conftool/dbconfig/20220127-093755-root.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19411 and previous config saved to /var/cache/conftool/dbconfig/20220127-093146-marostegui.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19410 and previous config saved to /var/cache/conftool/dbconfig/20220127-092957-marostegui.json
* 09:27 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus2005.codfw.wmnet
* 09:27 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus2006.codfw.wmnet
* 09:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2078,2132].codfw.wmnet,db[1117,1128,1159].eqiad.wmnet with reason: Primary switchover m1 [[phab:T299624|T299624]]
* 09:23 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2078,2132].codfw.wmnet,db[1117,1128,1159].eqiad.wmnet with reason: Primary switchover m1 [[phab:T299624|T299624]]
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19409 and previous config saved to /var/cache/conftool/dbconfig/20220127-092251-root.json
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 09:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19408 and previous config saved to /var/cache/conftool/dbconfig/20220127-091641-marostegui.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19407 and previous config saved to /var/cache/conftool/dbconfig/20220127-091453-marostegui.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19406 and previous config saved to /var/cache/conftool/dbconfig/20220127-091440-marostegui.json
* 09:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 09:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19405 and previous config saved to /var/cache/conftool/dbconfig/20220127-091401-marostegui.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19404 and previous config saved to /var/cache/conftool/dbconfig/20220127-090747-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19403 and previous config saved to /var/cache/conftool/dbconfig/20220127-085857-marostegui.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19402 and previous config saved to /var/cache/conftool/dbconfig/20220127-085244-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19401 and previous config saved to /var/cache/conftool/dbconfig/20220127-084352-marostegui.json
* 08:41 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15]: scap testing (duration: 00m 05s)
* 08:40 jayme@deploy1002: Started deploy [restbase/deploy@0848b15]: scap testing
* 08:38 jayme: updated scap to 4.2.1 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary, A:restbase-canary - [[phab:T300058|T300058]]
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19400 and previous config saved to /var/cache/conftool/dbconfig/20220127-083740-root.json
* 08:33 jayme: uploaded scap 4.2.1 to apt.wikimedia.org - [[phab:T300058|T300058]]
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19399 and previous config saved to /var/cache/conftool/dbconfig/20220127-082847-marostegui.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19398 and previous config saved to /var/cache/conftool/dbconfig/20220127-082735-marostegui.json
* 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19397 and previous config saved to /var/cache/conftool/dbconfig/20220127-082728-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19396 and previous config saved to /var/cache/conftool/dbconfig/20220127-082236-root.json
* 08:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19395 and previous config saved to /var/cache/conftool/dbconfig/20220127-081622-marostegui.json
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 08:13 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.19/includes/libs/rdbms/database/Database.php: Backport: [[gerrit:757476{{!}}Don't consider lock waits to be write queries (T300194)]] (duration: 00m 52s)
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19394 and previous config saved to /var/cache/conftool/dbconfig/20220127-081223-marostegui.json
* 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19393 and previous config saved to /var/cache/conftool/dbconfig/20220127-080733-root.json
* 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19392 and previous config saved to /var/cache/conftool/dbconfig/20220127-075909-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19391 and previous config saved to /var/cache/conftool/dbconfig/20220127-075718-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19390 and previous config saved to /var/cache/conftool/dbconfig/20220127-075229-root.json
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1131.eqiad.wmnet with OS bullseye
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19389 and previous config saved to /var/cache/conftool/dbconfig/20220127-074404-marostegui.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19388 and previous config saved to /var/cache/conftool/dbconfig/20220127-074214-marostegui.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19387 and previous config saved to /var/cache/conftool/dbconfig/20220127-074101-marostegui.json
* 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19386 and previous config saved to /var/cache/conftool/dbconfig/20220127-074033-marostegui.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19385 and previous config saved to /var/cache/conftool/dbconfig/20220127-072900-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19384 and previous config saved to /var/cache/conftool/dbconfig/20220127-072528-marostegui.json
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1131.eqiad.wmnet with OS bullseye
* 07:17 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1131.eqiad.wmnet with OS bullseye
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19383 and previous config saved to /var/cache/conftool/dbconfig/20220127-071355-marostegui.json
* 07:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1131.eqiad.wmnet with OS bullseye
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19382 and previous config saved to /var/cache/conftool/dbconfig/20220127-071023-marostegui.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P19381 and previous config saved to /var/cache/conftool/dbconfig/20220127-070821-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s8 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P19380 and previous config saved to /var/cache/conftool/dbconfig/20220127-070557-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1021', diff saved to https://phabricator.wikimedia.org/P19379 and previous config saved to /var/cache/conftool/dbconfig/20220127-070532-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19378 and previous config saved to /var/cache/conftool/dbconfig/20220127-070428-marostegui.json
* 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19377 and previous config saved to /var/cache/conftool/dbconfig/20220127-065519-marostegui.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19376 and previous config saved to /var/cache/conftool/dbconfig/20220127-065406-marostegui.json
* 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 06:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 04:01 Krinkle: grafana: Temporarily silence resourceloader alert for INM satisfaction ratio, pending [[phab:T298520|T298520]].
* 00:58 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757549{{!}}commonswiki: Add leg.journals.isu.ac.ir to the wgCopyUploadsDomains allowlist (T300217)]] (duration: 00m 55s)
* 00:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:24 thcipriani: restarting jenkins


== 2022-01-26 ==
== 2022-08-06 ==
* 23:34 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): parking the train at group0 until US morning; we have a probable fix for [[phab:T300194|T300194]] but CI is having issues
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): rolling back due to increase in DBTransactionSizeErrors
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]]"
* 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 03:02 krinkle@deploy1002: Synchronized w/: {{Gerrit|I9067d47fab0324}} (duration: 03m 25s)
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]] (duration: 00m 54s)
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]]
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:01 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): all known blockers patched, logs for wmf.19 quiet - proceeding to group1
* 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:00 mutante: mw131* - purging remaining font packages
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:53 mutante: labweb1001, labweb1002, cloudweb2001-dev (wikitech hosts) -  apt-get remove --purge fonts*; apt-get remove --purge xfonts* {{!}} purging font packages that had been installed as dependencies ([[phab:T294378|T294378]])
* 02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 19:50 accraze@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:46 Lucas_WMDE: UTC evening backport window done
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757436{{!}}fawiki: Add unwatchedpages permission to eliminators (T300126)]] (duration: 00m 51s)
* 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:42 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.19/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:757145{{!}}Don't wrap unknown actions with confirmation (T300095)]] (duration: 00m 51s)
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:33 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.19/includes/skins/Skin.php: Backport: [[gerrit:757467{{!}}Fix empty div when there's no sitenotice. (T300096)]] (duration: 00m 51s)
* 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:756978{{!}}bgwiki: Add 'wgNamespaceRobotPolicies' for Draft (Talk) namespace (T299224)]] (duration: 00m 52s)
* 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 19:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 19:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 19:22 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 19:22 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 19:20 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:20 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:19 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 19:19 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:19 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 19:19 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.19/skins/Timeless/includes/TimelessTemplate.php: Backport: [[gerrit:757470{{!}}Do not duplicate categories in primary action tabs space (T300100)]] (duration: 00m 51s)
* 19:18 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:756985{{!}}[wmf-config] Undeploy gdi survey on cawiki beta (T299913)]] (no-op sync, beta only) (duration: 00m 52s)
* 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19374 and previous config saved to /var/cache/conftool/dbconfig/20220126-191002-marostegui.json
* 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19373 and previous config saved to /var/cache/conftool/dbconfig/20220126-185457-marostegui.json
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19372 and previous config saved to /var/cache/conftool/dbconfig/20220126-183953-marostegui.json
* 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19371 and previous config saved to /var/cache/conftool/dbconfig/20220126-182448-marostegui.json
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19370 and previous config saved to /var/cache/conftool/dbconfig/20220126-182333-marostegui.json
* 18:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19369 and previous config saved to /var/cache/conftool/dbconfig/20220126-182325-marostegui.json
* 18:14 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
* 18:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1019.eqiad.wmnet with OS buster
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P19368 and previous config saved to /var/cache/conftool/dbconfig/20220126-180819-marostegui.json
* 18:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 17:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19366 and previous config saved to /var/cache/conftool/dbconfig/20220126-175405-root.json
* 17:53 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P19365 and previous config saved to /var/cache/conftool/dbconfig/20220126-175315-marostegui.json
* 17:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19364 and previous config saved to /var/cache/conftool/dbconfig/20220126-173901-root.json
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19363 and previous config saved to /var/cache/conftool/dbconfig/20220126-173810-marostegui.json
* 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19361 and previous config saved to /var/cache/conftool/dbconfig/20220126-173654-marostegui.json
* 17:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 17:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19360 and previous config saved to /var/cache/conftool/dbconfig/20220126-173647-marostegui.json
* 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19359 and previous config saved to /var/cache/conftool/dbconfig/20220126-172358-root.json
* 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19357 and previous config saved to /var/cache/conftool/dbconfig/20220126-172141-marostegui.json
* 17:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS buster
* 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19356 and previous config saved to /var/cache/conftool/dbconfig/20220126-170852-root.json
* 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19355 and previous config saved to /var/cache/conftool/dbconfig/20220126-170635-marostegui.json
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19354 and previous config saved to /var/cache/conftool/dbconfig/20220126-165349-root.json
* 16:53 jayme: published image docker-registry.discovery.wmnet/cfssl-issuer:0.2.1-1 - [[phab:T299906|T299906]]
* 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19353 and previous config saved to /var/cache/conftool/dbconfig/20220126-165130-marostegui.json
* 16:51 ryankemper: [WCQS Deploy] Restarted updaters across fleet: `ryankemper@cumin1001:~$ sudo cumin -b 6 'wcqs*' 'sudo systemctl restart wcqs-updater'`
* 16:47 moritzm: draining instances off ganeti1007 for reimage
* 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19352 and previous config saved to /var/cache/conftool/dbconfig/20220126-163845-root.json
* 16:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1019.eqiad.wmnet
* 16:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on restbase1019.eqiad.wmnet with reason: Firmware upgrade
* 16:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on restbase1019.eqiad.wmnet with reason: Firmware upgrade
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19351 and previous config saved to /var/cache/conftool/dbconfig/20220126-162810-marostegui.json
* 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19350 and previous config saved to /var/cache/conftool/dbconfig/20220126-162756-marostegui.json
* 16:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19349 and previous config saved to /var/cache/conftool/dbconfig/20220126-162342-root.json
* 16:23 elukey: restart varnishkafka instances on cp1087
* 16:17 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19348 and previous config saved to /var/cache/conftool/dbconfig/20220126-161252-marostegui.json
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19347 and previous config saved to /var/cache/conftool/dbconfig/20220126-160838-root.json
* 16:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19346 and previous config saved to /var/cache/conftool/dbconfig/20220126-155747-marostegui.json
* 15:54 vgutierrez: upgrading varnishkafka to version 1.1.0 on cp[6002,6005,6009-6013].drmrs.wmnet,cp1087.eqiad.wmnet,cp[4021,4033-4034,4036].ulsfo.wmnet
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19345 and previous config saved to /var/cache/conftool/dbconfig/20220126-155334-root.json
* 15:47 vgutierrez: pool cp4035
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19344 and previous config saved to /var/cache/conftool/dbconfig/20220126-154242-marostegui.json
* 15:42 vgutierrez: restarting varnish-frontend on cp4035
* 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1025.eqiad.wmnet with OS bullseye
* 15:40 vgutierrez: depool cp4035
* 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19343 and previous config saved to /var/cache/conftool/dbconfig/20220126-154026-marostegui.json
* 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19342 and previous config saved to /var/cache/conftool/dbconfig/20220126-154019-marostegui.json
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19341 and previous config saved to /var/cache/conftool/dbconfig/20220126-153831-root.json
* 15:29 XioNoX: add pay-lvs1003/4 to pfw3-eqiad BGP
* 15:25 joal@deploy1002: Finished deploy [analytics/refinery@ab7f732] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@ab7f732] (duration: 05m 30s)
* 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19340 and previous config saved to /var/cache/conftool/dbconfig/20220126-152514-marostegui.json
* 15:24 ottomata: paused (for meetings) in deploying new CA certs for all eventgate services, still TODO: eventgate-analytics-external, eventgate-main - [[phab:T296064|T296064]]
* 15:20 joal@deploy1002: Started deploy [analytics/refinery@ab7f732] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@ab7f732]
* 15:14 joal@deploy1002: Finished deploy [analytics/refinery@ab7f732] (thin): Regular analytics weekly train THIN [analytics/refinery@ab7f732] (duration: 00m 07s)
* 15:14 joal@deploy1002: Started deploy [analytics/refinery@ab7f732] (thin): Regular analytics weekly train THIN [analytics/refinery@ab7f732]
* 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1006.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 15:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1006.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19338 and previous config saved to /var/cache/conftool/dbconfig/20220126-151009-marostegui.json
* 15:08 joal@deploy1002: Finished deploy [analytics/refinery@ab7f732]: Regular analytics weekly train [analytics/refinery@ab7f732] (duration: 16m 38s)
* 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1025.eqiad.wmnet with OS bullseye
* 15:06 elukey: elukey@cp4035:~$ sudo systemctl restart varnishkafka-eventlogging.service - metrics showing messages stuck for a poll()
* 15:03 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1025.mgmt.eqiad.wmnet with reboot policy GRACEFUL
* 15:00 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync on production
* 14:58 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync on canary
* 14:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply on canary
* 14:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply on production
* 14:57 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync on production
* 14:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1005.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 14:56 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync on canary
* 14:56 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply on production
* 14:56 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply on canary
* 14:55 elukey: elukey@cp4035:~$ sudo systemctl restart varnishkafka-webrequest.service - metrics showing messages stuck for a poll()
* 14:55 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync on production
* 14:55 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply on canary
* 14:55 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply on production
* 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19337 and previous config saved to /var/cache/conftool/dbconfig/20220126-145505-marostegui.json
* 14:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1005.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 14:54 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync on production
* 14:54 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync on canary
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19336 and previous config saved to /var/cache/conftool/dbconfig/20220126-145349-marostegui.json
* 14:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 14:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19335 and previous config saved to /var/cache/conftool/dbconfig/20220126-145342-marostegui.json
* 14:53 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply on canary
* 14:53 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply on production
* 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:52 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync on production
* 14:52 joal@deploy1002: Started deploy [analytics/refinery@ab7f732]: Regular analytics weekly train [analytics/refinery@ab7f732]
* 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:50 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync on canary
* 14:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply on canary
* 14:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply on production
* 14:50 volans@cumin1001: START - Cookbook sre.hosts.provision for host es1025.mgmt.eqiad.wmnet with reboot policy GRACEFUL
* 14:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:42 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: sync on production
* 14:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply on canary
* 14:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply on production
* 14:41 ottomata: deploying new CA certs for all eventgate services... [[phab:T296064|T296064]]
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19334 and previous config saved to /var/cache/conftool/dbconfig/20220126-143837-marostegui.json
* 14:38 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on production
* 14:37 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on canary
* 14:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync on canary
* 14:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync on production
* 14:37 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on production
* 14:36 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on canary
* 14:36 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync on canary
* 14:36 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync on production
* 14:35 ottomata: roll restarting eventgate-analytics to pick up stream config change https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/757122
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19333 and previous config saved to /var/cache/conftool/dbconfig/20220126-142620-root.json
* 14:25 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on canary
* 14:25 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on production
* 14:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync on production
* 14:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync on canary
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19332 and previous config saved to /var/cache/conftool/dbconfig/20220126-142332-marostegui.json
* 14:23 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply on canary
* 14:23 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply on production
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19331 and previous config saved to /var/cache/conftool/dbconfig/20220126-142255-marostegui.json
* 14:22 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on canary
* 14:22 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on production
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19330 and previous config saved to /var/cache/conftool/dbconfig/20220126-141113-root.json
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19329 and previous config saved to /var/cache/conftool/dbconfig/20220126-140827-marostegui.json
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19328 and previous config saved to /var/cache/conftool/dbconfig/20220126-140751-marostegui.json
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19327 and previous config saved to /var/cache/conftool/dbconfig/20220126-140712-marostegui.json
* 14:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 14:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19326 and previous config saved to /var/cache/conftool/dbconfig/20220126-140629-marostegui.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025', diff saved to https://phabricator.wikimedia.org/P19325 and previous config saved to /var/cache/conftool/dbconfig/20220126-135635-marostegui.json
* 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1014.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19324 and previous config saved to /var/cache/conftool/dbconfig/20220126-135610-root.json
* 13:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1014.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19323 and previous config saved to /var/cache/conftool/dbconfig/20220126-135245-marostegui.json
* 13:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P19322 and previous config saved to /var/cache/conftool/dbconfig/20220126-135124-marostegui.json
* 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
* 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19321 and previous config saved to /var/cache/conftool/dbconfig/20220126-134106-root.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19320 and previous config saved to /var/cache/conftool/dbconfig/20220126-133740-marostegui.json
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19319 and previous config saved to /var/cache/conftool/dbconfig/20220126-133634-marostegui.json
* 13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19318 and previous config saved to /var/cache/conftool/dbconfig/20220126-133627-marostegui.json
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P19317 and previous config saved to /var/cache/conftool/dbconfig/20220126-133619-marostegui.json
* 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
* 13:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19316 and previous config saved to /var/cache/conftool/dbconfig/20220126-132603-root.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s8 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P19315 and previous config saved to /var/cache/conftool/dbconfig/20220126-132600-marostegui.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19314 and previous config saved to /var/cache/conftool/dbconfig/20220126-132122-marostegui.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19313 and previous config saved to /var/cache/conftool/dbconfig/20220126-132114-marostegui.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298559|T298559]])', diff saved to https://phabricator.wikimedia.org/P19311 and previous config saved to /var/cache/conftool/dbconfig/20220126-131959-marostegui.json
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:17 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
* 13:16 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
* 13:16 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
* 13:16 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
* 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19310 and previous config saved to /var/cache/conftool/dbconfig/20220126-131047-root.json
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19309 and previous config saved to /var/cache/conftool/dbconfig/20220126-130611-marostegui.json
* 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19308 and previous config saved to /var/cache/conftool/dbconfig/20220126-130527-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19307 and previous config saved to /var/cache/conftool/dbconfig/20220126-125543-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19306 and previous config saved to /var/cache/conftool/dbconfig/20220126-125107-marostegui.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19305 and previous config saved to /var/cache/conftool/dbconfig/20220126-125023-root.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19304 and previous config saved to /var/cache/conftool/dbconfig/20220126-125001-marostegui.json
* 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19303 and previous config saved to /var/cache/conftool/dbconfig/20220126-124953-marostegui.json
* 12:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:43 dcausse: UTC morning backport done
* 12:41 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757122{{!}}Correct wcqs event stream name]] (duration: 00m 51s)
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19302 and previous config saved to /var/cache/conftool/dbconfig/20220126-124040-root.json
* 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19301 and previous config saved to /var/cache/conftool/dbconfig/20220126-123839-ladsgroup.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19300 and previous config saved to /var/cache/conftool/dbconfig/20220126-123520-root.json
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19299 and previous config saved to /var/cache/conftool/dbconfig/20220126-123448-marostegui.json
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:32 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757423{{!}}commonswiki: Add www.kew.org to the wgCopyUploadsDomains allowlist (T300101)]] (duration: 00m 51s)
* 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on staging
* 12:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
* 12:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
* 12:28 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
* 12:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19298 and previous config saved to /var/cache/conftool/dbconfig/20220126-122536-root.json
* 12:25 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757414{{!}}fawiki: Add unwatchedpages permission to patrollers (T300126)]] (duration: 00m 51s)
* 12:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:22 moritzm: installing apache security updates
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19297 and previous config saved to /var/cache/conftool/dbconfig/20220126-122016-root.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19296 and previous config saved to /var/cache/conftool/dbconfig/20220126-121944-marostegui.json
* 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2024.codfw.wmnet with OS bullseye
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19294 and previous config saved to /var/cache/conftool/dbconfig/20220126-121032-root.json
* 12:09 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
* 12:09 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
* 12:09 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
* 12:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1137.eqiad.wmnet with OS bullseye
* 12:09 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757413{{!}}Deal with change in MachineVision handler constructor]] (duration: 00m 51s)
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19293 and previous config saved to /var/cache/conftool/dbconfig/20220126-120513-root.json
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19292 and previous config saved to /var/cache/conftool/dbconfig/20220126-120439-marostegui.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19291 and previous config saved to /var/cache/conftool/dbconfig/20220126-120132-marostegui.json
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19290 and previous config saved to /var/cache/conftool/dbconfig/20220126-120125-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19288 and previous config saved to /var/cache/conftool/dbconfig/20220126-115009-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19287 and previous config saved to /var/cache/conftool/dbconfig/20220126-114619-marostegui.json
* 11:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1137.eqiad.wmnet with OS bullseye
* 11:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 [[phab:T300099|T300099]]', diff saved to https://phabricator.wikimedia.org/P19286 and previous config saved to /var/cache/conftool/dbconfig/20220126-114236-marostegui.json
* 11:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2024.codfw.wmnet with OS bullseye
* 11:41 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.19/includes/libs/rdbms/database/Database.php: Backport: [[gerrit:757133{{!}}rdbms: Pass commented SQL to the GeneralizedSql for logging (T298687)]] (duration: 00m 54s)
* 11:41 moritzm: installing libxfont security updates
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19285 and previous config saved to /var/cache/conftool/dbconfig/20220126-113730-root.json
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2024 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19284 and previous config saved to /var/cache/conftool/dbconfig/20220126-113626-ladsgroup.json
* 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
* 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19283 and previous config saved to /var/cache/conftool/dbconfig/20220126-113505-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19282 and previous config saved to /var/cache/conftool/dbconfig/20220126-113115-marostegui.json
* 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19281 and previous config saved to /var/cache/conftool/dbconfig/20220126-112719-ladsgroup.json
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19280 and previous config saved to /var/cache/conftool/dbconfig/20220126-112439-ladsgroup.json
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19279 and previous config saved to /var/cache/conftool/dbconfig/20220126-112227-root.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19278 and previous config saved to /var/cache/conftool/dbconfig/20220126-112002-root.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19277 and previous config saved to /var/cache/conftool/dbconfig/20220126-111610-marostegui.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19276 and previous config saved to /var/cache/conftool/dbconfig/20220126-111504-marostegui.json
* 11:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19275 and previous config saved to /var/cache/conftool/dbconfig/20220126-111425-marostegui.json
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19274 and previous config saved to /var/cache/conftool/dbconfig/20220126-110723-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19273 and previous config saved to /var/cache/conftool/dbconfig/20220126-110458-root.json
* 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2025.codfw.wmnet with OS bullseye
* 11:03 hnowlan@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 22m 16s)
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19272 and previous config saved to /var/cache/conftool/dbconfig/20220126-105921-marostegui.json
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19271 and previous config saved to /var/cache/conftool/dbconfig/20220126-105220-root.json
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19270 and previous config saved to /var/cache/conftool/dbconfig/20220126-104955-root.json
* 10:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1020.eqiad.wmnet with OS bullseye
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19269 and previous config saved to /var/cache/conftool/dbconfig/20220126-104416-marostegui.json
* 10:41 hnowlan@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
* 10:41 btullis: re-enabled puppet on all cp-* nodes.
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19268 and previous config saved to /var/cache/conftool/dbconfig/20220126-103716-root.json
* 10:34 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001 (duration: 00m 33s)
* 10:34 oblivian@deploy1002: Started deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001
* 10:33 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001 (duration: 01m 05s)
* 10:32 oblivian@deploy1002: Started deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19267 and previous config saved to /var/cache/conftool/dbconfig/20220126-102911-marostegui.json
* 10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2025.codfw.wmnet with OS bullseye
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19266 and previous config saved to /var/cache/conftool/dbconfig/20220126-102805-marostegui.json
* 10:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19265 and previous config saved to /var/cache/conftool/dbconfig/20220126-102758-marostegui.json
* 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1006.eqiad.wmnet with OS buster
* 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2025 ([[phab:T300006|T300006]])', diff saved to https://phabricator.wikimedia.org/P19264 and previous config saved to /var/cache/conftool/dbconfig/20220126-102445-ladsgroup.json
* 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
* 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19263 and previous config saved to /var/cache/conftool/dbconfig/20220126-102213-root.json
* 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1020.eqiad.wmnet with OS bullseye
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19261 and previous config saved to /var/cache/conftool/dbconfig/20220126-101253-marostegui.json
* 10:12 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1020.mgmt.eqiad.wmnet with reboot policy GRACEFUL
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19260 and previous config saved to /var/cache/conftool/dbconfig/20220126-100709-root.json
* 10:01 volans@cumin1001: START - Cookbook sre.hosts.provision for host es1020.mgmt.eqiad.wmnet with reboot policy GRACEFUL
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19259 and previous config saved to /var/cache/conftool/dbconfig/20220126-095749-marostegui.json
* 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1006.eqiad.wmnet with OS buster
* 09:53 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
* 09:52 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
* 09:52 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
* 09:52 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19258 and previous config saved to /var/cache/conftool/dbconfig/20220126-095205-root.json
* 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1005.eqiad.wmnet with OS buster
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19257 and previous config saved to /var/cache/conftool/dbconfig/20220126-094244-marostegui.json
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19256 and previous config saved to /var/cache/conftool/dbconfig/20220126-094138-marostegui.json
* 09:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19255 and previous config saved to /var/cache/conftool/dbconfig/20220126-094131-marostegui.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19254 and previous config saved to /var/cache/conftool/dbconfig/20220126-093702-root.json
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1005.eqiad.wmnet with OS buster
* 09:32 jayme: updated scap to 4.2.0 on A:restbase-canary - [[phab:T300058|T300058]]
* 09:28 godog: begin rsync prometheus2004 -> 2005 - [[phab:T296199|T296199]]
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19252 and previous config saved to /var/cache/conftool/dbconfig/20220126-092626-marostegui.json
* 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1005.eqiad.wmnet with OS buster
* 09:25 jayme: updated scap to 4.2.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary - [[phab:T300058|T300058]]
* 09:24 jayme: uploaded scap 4.2.0 to apt.wikimedia.org - [[phab:T300058|T300058]]
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19251 and previous config saved to /var/cache/conftool/dbconfig/20220126-092158-root.json
* 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1120.eqiad.wmnet with OS bullseye
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19250 and previous config saved to /var/cache/conftool/dbconfig/20220126-091121-marostegui.json
* 09:06 jayme: uploaded scap 4.2.0 to apt.wikimedia.org
* 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1005.eqiad.wmnet with OS buster
* 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1120.eqiad.wmnet with OS bullseye
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 [[phab:T300099|T300099]]', diff saved to https://phabricator.wikimedia.org/P19249 and previous config saved to /var/cache/conftool/dbconfig/20220126-085733-marostegui.json
* 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1014.eqiad.wmnet with OS buster
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19248 and previous config saved to /var/cache/conftool/dbconfig/20220126-085616-marostegui.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19247 and previous config saved to /var/cache/conftool/dbconfig/20220126-085510-marostegui.json
* 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 08:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19246 and previous config saved to /var/cache/conftool/dbconfig/20220126-085503-marostegui.json
* 08:41 moritzm: draining instances off ganeti1015 for reimage
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19245 and previous config saved to /var/cache/conftool/dbconfig/20220126-083958-marostegui.json
* 08:31 jelto: sign puppet cert for gitlab-runner1001.eqiad.wmnet
* 08:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1014.eqiad.wmnet with OS buster
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19244 and previous config saved to /var/cache/conftool/dbconfig/20220126-082453-marostegui.json
* 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1013.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 08:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1013.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19243 and previous config saved to /var/cache/conftool/dbconfig/20220126-080948-marostegui.json
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19242 and previous config saved to /var/cache/conftool/dbconfig/20220126-080842-marostegui.json
* 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19241 and previous config saved to /var/cache/conftool/dbconfig/20220126-080831-marostegui.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19240 and previous config saved to /var/cache/conftool/dbconfig/20220126-075326-marostegui.json
* 07:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2131.codfw.wmnet with OS bullseye
* 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:49 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1020.eqiad.wmnet with OS bullseye
* 07:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:45 taavi@deploy1002: Synchronized wmf-config/interwiki.php: Config: [[gerrit:757377{{!}}Update interwiki cache]] (duration: 00m 52s)
* 07:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1020.eqiad.wmnet with OS bullseye
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19239 and previous config saved to /var/cache/conftool/dbconfig/20220126-073822-marostegui.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19238 and previous config saved to /var/cache/conftool/dbconfig/20220126-072317-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19237 and previous config saved to /var/cache/conftool/dbconfig/20220126-072211-marostegui.json
* 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19236 and previous config saved to /var/cache/conftool/dbconfig/20220126-072200-marostegui.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2115.codfw.wmnet with OS bullseye
* 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2131.codfw.wmnet with OS bullseye
* 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2096.codfw.wmnet with OS bullseye
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19235 and previous config saved to /var/cache/conftool/dbconfig/20220126-070654-marostegui.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19234 and previous config saved to /var/cache/conftool/dbconfig/20220126-065149-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked from s8 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P19233 and previous config saved to /var/cache/conftool/dbconfig/20220126-064653-marostegui.json
* 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2115.codfw.wmnet with OS bullseye
* 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2096.codfw.wmnet with OS bullseye
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19232 and previous config saved to /var/cache/conftool/dbconfig/20220126-063644-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 [[phab:T300005|T300005]]', diff saved to https://phabricator.wikimedia.org/P19231 and previous config saved to /var/cache/conftool/dbconfig/20220126-063149-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19230 and previous config saved to /var/cache/conftool/dbconfig/20220126-063037-marostegui.json
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086 (s7,s8) [[phab:T299882|T299882]]', diff saved to https://phabricator.wikimedia.org/P19229 and previous config saved to /var/cache/conftool/dbconfig/20220126-062406-marostegui.json
* 05:02 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@dc7c5ac] (wcqs): Deploy 0.3.100 to WCQS (duration: 02m 21s)
* 04:59 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dc7c5ac] (wcqs): Deploy 0.3.100 to WCQS
* 04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 03:42 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 03:42 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 03:42 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 03:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@dc7c5ac]: 0.3.100 (duration: 08m 35s)
* 03:32 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.100` on canary `wdqs1003`; proceeding to rest of fleet
* 03:31 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dc7c5ac]: 0.3.100
* 03:30 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.100`. Pre-deploy tests passing on canary `wdqs1003`
* 02:49 ryankemper: [WDQS] [[phab:T299098|T299098]] `ryankemper@wdqs2003:~$ sudo pool` (forgot to pool after dcops fixed hw issue)
* 01:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:04 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757087{{!}}Enable migration mode on Italian and MediaWIki.org (T299927)]] (duration: 00m 54s)
* 01:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:00 catrope@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/: Backport: [[gerrit:756997{{!}}Do not load common.js twice (T300070)]] and [[gerrit:756696{{!}}Fix bug in SkinVersionLookup (T299971)]] (duration: 00m 51s)
* 01:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:56 catrope@deploy1002: Synchronized php-1.38.0-wmf.19/skins/Vector/: Backport: [[gerrit:756998{{!}}Do not load common.js twice (T300070)]] (duration: 02m 43s)
* 00:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:11 ryankemper: [[phab:T294805|T294805]] Reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003 (elasticsearch-oss dependency issues, will pick this back up tomorrow); re-enabling puppet across elastic1*
* 00:03 ryankemper: [[phab:T294805|T294805]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003; running puppet on `elastic1068` to make it join the fleet


== 2022-01-25 ==
== 2022-08-05 ==
* 23:42 ryankemper: [[phab:T294805|T294805]] [Elastic] Step 2: Disabling puppet in advance of merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/736117
* 22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
* 23:20 ryankemper: [[phab:T294805|T294805]] [Elastic] Merged https://gerrit.wikimedia.org/r/736116, step 1 of bringing new eqiad 10G refresh hosts into service
* 22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
* 21:20 bblack@cumin1001: conftool action : set/weight=100; selector: dc=drmrs,service=ats-be
* 17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
* 21:20 bblack@cumin1001: conftool action : set/weight=1; selector: dc=drmrs,service=varnish-fe
* 16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
* 21:20 bblack@cumin1001: conftool action : set/weight=1; selector: dc=drmrs,service=ats-tls
* 16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 21:03 cwhite: end transition to logstash output opensearch plugin [[phab:T299168|T299168]]
* 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
* 20:17 cwhite: begin transition to logstash output opensearch plugin [[phab:T299168|T299168]]
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:26 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS bullseye
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1192.eqiad.wmnet with OS bullseye
* 20:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]]
* 16:12 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports (duration: 02m 03s)
* 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1008.eqiad.wmnet with OS buster
* 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 20:01 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): testwiki sync finished, still no open blockers, proceeding to group0
* 16:11 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :) (duration: 06m 09s)
* 19:50 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]] (duration: 52m 01s)
* 16:10 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports
* 19:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
* 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 19:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:07 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 19:35 cmjohnson1: updating firmware ganeti1006 [[phab:T299527|T299527]]
* 16:05 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :)
* 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:04 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine (duration: 34m 38s)
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:03 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:55 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
* 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Make es1028 master of es3 [[phab:T299911|T299911]]', diff saved to https://phabricator.wikimedia.org/P19221 and previous config saved to /var/cache/conftool/dbconfig/20220125-191238-ladsgroup.json
* 15:52 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bullseye
* 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19220 and previous config saved to /var/cache/conftool/dbconfig/20220125-190949-ladsgroup.json
* 15:51 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
* 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:42 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bullseye
* 19:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 15:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 19:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:30 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine
* 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:25 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 18:58 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.19  refs [[phab:T293960|T293960]]
* 15:24 jbond: upload trapperkeeper-metrics-clojure to puppet7 component
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:22 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
* 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:19 jbond: upload puppetlabs-http-client-clojur to puppet7 component
* 18:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P19219 and previous config saved to /var/cache/conftool/dbconfig/20220125-185444-ladsgroup.json
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19218 and previous config saved to /var/cache/conftool/dbconfig/20220125-184714-root.json
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:44 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
* 15:14 dancy@deploy1002: Finished scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory (duration: 04m 41s)
* 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P19217 and previous config saved to /var/cache/conftool/dbconfig/20220125-183940-ladsgroup.json
* 15:11 jbond: upload jolokia to puppet7 component
* 18:38 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: sync on production
* 15:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
* 18:34 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply on production
* 15:09 dancy@deploy1002: Started scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory
* 18:33 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: sync on production
* 15:09 jbond: upload test-chuck-clojure to puppet7 component
* 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19216 and previous config saved to /var/cache/conftool/dbconfig/20220125-183210-root.json
* 15:05 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
* 18:31 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply on production
* 15:04 jbond: upload test-check-clojure to puppet7 component
* 18:30 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: sync on production
* 14:57 jbond: upload nippy-clojure to puppet7 component
* 18:29 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply on production
* 14:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 18:28 moritzm: installing policykit-1 security updates on buster
* 14:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:43 jbond: upload fressian to puppet7 component
* 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19215 and previous config saved to /var/cache/conftool/dbconfig/20220125-182435-ladsgroup.json
* 14:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:40 jbond: upload test-generative-clojure to puppet7 component
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:34 jbond: upload data-generators-clojure to puppet7 component
* 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1028.eqiad.wmnet with OS bullseye
* 14:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19214 and previous config saved to /var/cache/conftool/dbconfig/20220125-181706-root.json
* 14:23 jbond: upload encore-clojure to puppet7 component
* 18:14 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): no open blockers, starting stage-train script shortly
* 14:17 jbond: upload truss-clojure to puppet7 component
* 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19213 and previous config saved to /var/cache/conftool/dbconfig/20220125-180203-root.json
* 14:13 jbond: upload structured-logging-clojure to puppet7 component
* 18:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:06 jbond: upload murphy-clojure to puppet7 component
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:57 jbond: upload logstash-logback-encoder-7.2 to puppet7 component
* 17:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:49 jbond: upload kitchensink-clojure to puppet7 component
* 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts with fragile power supply ([[phab:T314559|T314559]] [[phab:T314628|T314628]])', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json
* 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19212 and previous config saved to /var/cache/conftool/dbconfig/20220125-174659-root.json
* 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 17:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1028.eqiad.wmnet with OS bullseye
* 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19211 and previous config saved to /var/cache/conftool/dbconfig/20220125-173156-root.json
* 13:09 sukhe: repool codfw
* 17:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19210 and previous config saved to /var/cache/conftool/dbconfig/20220125-171652-root.json
* 13:02 jbond: upload honeysql-clojure to puppet7 component
* 17:02 cwhite: upgrade elasticsearch-curator on apifeatureusage1001
* 12:53 _joe_: progressive repool of services in codfw
* 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19209 and previous config saved to /var/cache/conftool/dbconfig/20220125-170148-root.json
* 12:24 moritzm: installing nano bugfix updates from bullseye point release
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on D3 ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32291 and previous config saved to /var/cache/conftool/dbconfig/20220805-113729-ladsgroup.json
* 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C6 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32290 and previous config saved to /var/cache/conftool/dbconfig/20220805-113555-ladsgroup.json
* 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19208 and previous config saved to /var/cache/conftool/dbconfig/20220125-164900-ladsgroup.json
* 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C5 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32289 and previous config saved to /var/cache/conftool/dbconfig/20220805-113436-ladsgroup.json
* 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
* 10:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
* 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19207 and previous config saved to /var/cache/conftool/dbconfig/20220125-164645-root.json
* 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:46 taavi: deploy updated patch for [[phab:T285116|T285116]]
* 10:12 Amir1: dbmaint at s4@codfw ([[phab:T312863|T312863]])
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Make es1031 master of es3 [[phab:T299911|T299911]]', diff saved to https://phabricator.wikimedia.org/P19206 and previous config saved to /var/cache/conftool/dbconfig/20220125-164324-ladsgroup.json
* 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19204 and previous config saved to /var/cache/conftool/dbconfig/20220125-164118-ladsgroup.json
* 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19203 and previous config saved to /var/cache/conftool/dbconfig/20220125-163721-marostegui.json
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19202 and previous config saved to /var/cache/conftool/dbconfig/20220125-163141-root.json
* 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19201 and previous config saved to /var/cache/conftool/dbconfig/20220125-163054-root.json
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P19200 and previous config saved to /var/cache/conftool/dbconfig/20220125-162613-ladsgroup.json
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19199 and previous config saved to /var/cache/conftool/dbconfig/20220125-162217-marostegui.json
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:21 cmjohnson1: updating firmware ganeti1005 [[phab:T299527|T299527]]
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gerrit2002.wikimedia.org
* 16:18 cmjohnson1: updating firmware ganeti1014 [[phab:T299527|T299527]]
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.remove-downtime for gerrit2002.wikimedia.org
* 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19198 and previous config saved to /var/cache/conftool/dbconfig/20220125-161550-root.json
* 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P19197 and previous config saved to /var/cache/conftool/dbconfig/20220125-161108-ladsgroup.json
* 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19196 and previous config saved to /var/cache/conftool/dbconfig/20220125-160712-marostegui.json
* 00:18 mutante: restarting gerrit for config change - removing old replica [[phab:T313250|T313250]]
* 16:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
* 16:06 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
* 16:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1022.eqiad.wmnet with OS bullseye
* 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19195 and previous config saved to /var/cache/conftool/dbconfig/20220125-160522-marostegui.json
* 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19194 and previous config saved to /var/cache/conftool/dbconfig/20220125-160047-root.json
* 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19193 and previous config saved to /var/cache/conftool/dbconfig/20220125-155604-ladsgroup.json
* 15:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1034.eqiad.wmnet with OS bullseye
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19192 and previous config saved to /var/cache/conftool/dbconfig/20220125-155207-marostegui.json
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19191 and previous config saved to /var/cache/conftool/dbconfig/20220125-155101-marostegui.json
* 15:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19190 and previous config saved to /var/cache/conftool/dbconfig/20220125-155053-marostegui.json
* 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19189 and previous config saved to /var/cache/conftool/dbconfig/20220125-155017-marostegui.json
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19187 and previous config saved to /var/cache/conftool/dbconfig/20220125-154543-root.json
* 15:38 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir6002.drmrs.wmnet
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19186 and previous config saved to /var/cache/conftool/dbconfig/20220125-153548-marostegui.json
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19185 and previous config saved to /var/cache/conftool/dbconfig/20220125-153511-marostegui.json
* 15:34 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 15:32 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
* 15:31 godog: centrallog1001:~# lvextend --resizefs --size +23G /dev/centrallog1001-vg/data
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19184 and previous config saved to /var/cache/conftool/dbconfig/20220125-153040-root.json
* 15:24 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host ncredir6002.drmrs.wmnet
* 15:21 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=ncredir6002.*
* 15:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1034.eqiad.wmnet with OS bullseye
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19183 and previous config saved to /var/cache/conftool/dbconfig/20220125-152044-marostegui.json
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19182 and previous config saved to /var/cache/conftool/dbconfig/20220125-152006-marostegui.json
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19181 and previous config saved to /var/cache/conftool/dbconfig/20220125-151900-marostegui.json
* 15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 15:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19180 and previous config saved to /var/cache/conftool/dbconfig/20220125-151852-marostegui.json
* 15:18 mmandere@cumin1001: conftool action : select; selector: cluster=necredir,dc=drmrs
* 15:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 15:17 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19179 and previous config saved to /var/cache/conftool/dbconfig/20220125-151536-root.json
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet
* 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19178 and previous config saved to /var/cache/conftool/dbconfig/20220125-150539-marostegui.json
* 15:04 bblack: lvs6002: restarting pybal
* 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
* 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19177 and previous config saved to /var/cache/conftool/dbconfig/20220125-150348-marostegui.json
* 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
* 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
* 15:03 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
* 15:03 bblack: lvs600[13]: restarting pybal
* 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19176 and previous config saved to /var/cache/conftool/dbconfig/20220125-150256-ladsgroup.json
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
* 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19175 and previous config saved to /var/cache/conftool/dbconfig/20220125-150052-ladsgroup.json
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19174 and previous config saved to /var/cache/conftool/dbconfig/20220125-150031-root.json
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19173 and previous config saved to /var/cache/conftool/dbconfig/20220125-144843-marostegui.json
* 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P19172 and previous config saved to /var/cache/conftool/dbconfig/20220125-144548-ladsgroup.json
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19171 and previous config saved to /var/cache/conftool/dbconfig/20220125-144528-root.json
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19170 and previous config saved to /var/cache/conftool/dbconfig/20220125-143338-marostegui.json
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19169 and previous config saved to /var/cache/conftool/dbconfig/20220125-143232-marostegui.json
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19168 and previous config saved to /var/cache/conftool/dbconfig/20220125-143218-marostegui.json
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P19167 and previous config saved to /var/cache/conftool/dbconfig/20220125-143043-ladsgroup.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19166 and previous config saved to /var/cache/conftool/dbconfig/20220125-143024-root.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager from s8 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P19165 and previous config saved to /var/cache/conftool/dbconfig/20220125-142614-marostegui.json
* 14:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner1001.eqiad.wmnet on all recursors
* 14:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab-runner1001.eqiad.wmnet on all recursors
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19164 and previous config saved to /var/cache/conftool/dbconfig/20220125-141714-marostegui.json
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19163 and previous config saved to /var/cache/conftool/dbconfig/20220125-141538-ladsgroup.json
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19162 and previous config saved to /var/cache/conftool/dbconfig/20220125-141520-root.json
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19161 and previous config saved to /var/cache/conftool/dbconfig/20220125-141520-marostegui.json
* 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19160 and previous config saved to /var/cache/conftool/dbconfig/20220125-141513-marostegui.json
* 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1026.eqiad.wmnet with OS bullseye
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19159 and previous config saved to /var/cache/conftool/dbconfig/20220125-140209-marostegui.json
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19158 and previous config saved to /var/cache/conftool/dbconfig/20220125-140008-marostegui.json
* 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1031.eqiad.wmnet with OS bullseye
* 13:55 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1022.eqiad.wmnet with OS bullseye
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086 (s7,s8) [[phab:T299882|T299882]]', diff saved to https://phabricator.wikimedia.org/P19157 and previous config saved to /var/cache/conftool/dbconfig/20220125-135212-marostegui.json
* 13:50 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 13:48 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1022.eqiad.wmnet with OS bullseye
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19156 and previous config saved to /var/cache/conftool/dbconfig/20220125-134704-marostegui.json
* 13:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19155 and previous config saved to /var/cache/conftool/dbconfig/20220125-134557-marostegui.json
* 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19154 and previous config saved to /var/cache/conftool/dbconfig/20220125-134547-marostegui.json
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19153 and previous config saved to /var/cache/conftool/dbconfig/20220125-134503-marostegui.json
* 13:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1026.eqiad.wmnet with OS bullseye
* 13:38 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 13:33 _joe_: restarted pybal on lvs6003
* 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 13:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 13:31 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=ncredir,name=ncredir6001.drmrs.wmnet
* 13:30 oblivian@puppetmaster1001: conftool action : set/weight=1; selector: dc=drmrs,cluster=ncredir
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19151 and previous config saved to /var/cache/conftool/dbconfig/20220125-133042-marostegui.json
* 13:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on gitlab-runner1001.eqiad.wmnet with reason: move gitlab-runner1001 to new ganeti row
* 13:30 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on gitlab-runner1001.eqiad.wmnet with reason: move gitlab-runner1001 to new ganeti row
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19150 and previous config saved to /var/cache/conftool/dbconfig/20220125-132958-marostegui.json
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19149 and previous config saved to /var/cache/conftool/dbconfig/20220125-132852-marostegui.json
* 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19148 and previous config saved to /var/cache/conftool/dbconfig/20220125-132844-marostegui.json
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:26 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1031.eqiad.wmnet with OS bullseye
* 13:22 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:20 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
* 13:20 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
* 13:20 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
* 13:19 taavi@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:752296{{!}}wikitech: use ldap-rw.$SITE for ldap access (T295150)]] (duration: 00m 49s)
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1026 [[phab:T299889|T299889]]', diff saved to https://phabricator.wikimedia.org/P19147 and previous config saved to /var/cache/conftool/dbconfig/20220125-131727-marostegui.json
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1030 to es2 master [[phab:T299889|T299889]]', diff saved to https://phabricator.wikimedia.org/P19146 and previous config saved to /var/cache/conftool/dbconfig/20220125-131622-marostegui.json
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19145 and previous config saved to /var/cache/conftool/dbconfig/20220125-131537-marostegui.json
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19144 and previous config saved to /var/cache/conftool/dbconfig/20220125-131340-marostegui.json
* 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]]
* 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]]
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19143 and previous config saved to /var/cache/conftool/dbconfig/20220125-130032-marostegui.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19142 and previous config saved to /var/cache/conftool/dbconfig/20220125-125923-marostegui.json
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19141 and previous config saved to /var/cache/conftool/dbconfig/20220125-125857-marostegui.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19140 and previous config saved to /var/cache/conftool/dbconfig/20220125-125835-marostegui.json
* 12:56 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on production
* 12:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on staging
* 12:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync on production
* 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on production
* 12:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on staging
* 12:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync on production
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19139 and previous config saved to /var/cache/conftool/dbconfig/20220125-124352-marostegui.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19138 and previous config saved to /var/cache/conftool/dbconfig/20220125-124330-marostegui.json
* 12:38 Lucas_WMDE: UTC morning backport window done
* 12:37 kharlan@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/GrowthExperiments/modules: Backport (2/2): [[gerrit:756941{{!}}Add an image: update onboarding images for desktop (T298109)]] (duration: 00m 49s)
* 12:36 kharlan@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/GrowthExperiments/images: Backport (1/2): [[gerrit:756941{{!}}Add an image: update onboarding images for desktop (T298109)]] (duration: 00m 50s)
* 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool es1031 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19136 and previous config saved to /var/cache/conftool/dbconfig/20220125-123303-ladsgroup.json
* 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19135 and previous config saved to /var/cache/conftool/dbconfig/20220125-122848-marostegui.json
* 12:17 hnowlan: removal of restbase2011 from cassandra cluster complete
* 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19134 and previous config saved to /var/cache/conftool/dbconfig/20220125-121343-marostegui.json
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755330{{!}}Enable statement usage tracking for Armenian Wikipedia (hywiki) (T296382)]] (duration: 00m 50s)
* 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19133 and previous config saved to /var/cache/conftool/dbconfig/20220125-120632-marostegui.json
* 12:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19132 and previous config saved to /var/cache/conftool/dbconfig/20220125-120625-marostegui.json
* 11:57 oblivian@puppetmaster1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=appserver,service=canary
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19131 and previous config saved to /var/cache/conftool/dbconfig/20220125-115120-marostegui.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19130 and previous config saved to /var/cache/conftool/dbconfig/20220125-114311-marostegui.json
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19129 and previous config saved to /var/cache/conftool/dbconfig/20220125-114258-marostegui.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19128 and previous config saved to /var/cache/conftool/dbconfig/20220125-113616-marostegui.json
* 11:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2021.codfw.wmnet with OS bullseye
* 11:29 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1011.eqiad.wmnet
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19127 and previous config saved to /var/cache/conftool/dbconfig/20220125-112753-marostegui.json
* 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2027.codfw.wmnet with OS bullseye
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19126 and previous config saved to /var/cache/conftool/dbconfig/20220125-112111-marostegui.json
* 11:19 moritzm: installing apache security updates
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19125 and previous config saved to /var/cache/conftool/dbconfig/20220125-111249-marostegui.json
* 11:07 godog: temp disable alerting on prometheus200[56] - [[phab:T296199|T296199]]
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19124 and previous config saved to /var/cache/conftool/dbconfig/20220125-105744-marostegui.json
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19123 and previous config saved to /var/cache/conftool/dbconfig/20220125-105636-marostegui.json
* 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19122 and previous config saved to /var/cache/conftool/dbconfig/20220125-105628-marostegui.json
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2021.codfw.wmnet with OS bullseye
* 10:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2027.codfw.wmnet with OS bullseye
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]]
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]]
* 10:50 hnowlan: disabling puppet on all maps hosts to test cassandra removal
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2011.eqiad.wmnet
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2020', diff saved to https://phabricator.wikimedia.org/P19121 and previous config saved to /var/cache/conftool/dbconfig/20220125-104331-marostegui.json
* 10:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2020.codfw.wmnet with OS bullseye
* 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19120 and previous config saved to /var/cache/conftool/dbconfig/20220125-104124-marostegui.json
* 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2029.codfw.wmnet with OS bullseye
* 10:36 hnowlan: nodetool removenode for restbase2011-c
* 10:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 [[phab:T299123|T299123]]', diff saved to https://phabricator.wikimedia.org/P19119 and previous config saved to /var/cache/conftool/dbconfig/20220125-102912-marostegui.json
* 10:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19118 and previous config saved to /var/cache/conftool/dbconfig/20220125-102619-marostegui.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19117 and previous config saved to /var/cache/conftool/dbconfig/20220125-102448-marostegui.json
* 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19116 and previous config saved to /var/cache/conftool/dbconfig/20220125-102426-marostegui.json
* 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 10:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19115 and previous config saved to /var/cache/conftool/dbconfig/20220125-101114-marostegui.json
* 10:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19114 and previous config saved to /var/cache/conftool/dbconfig/20220125-100921-marostegui.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19113 and previous config saved to /var/cache/conftool/dbconfig/20220125-100907-marostegui.json
* 10:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19112 and previous config saved to /var/cache/conftool/dbconfig/20220125-100900-marostegui.json
* 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:04 taavi@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:755534{{!}}Undeploy UserMerge (3) (T216089)]] (duration: 00m 48s)
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2020.codfw.wmnet with OS bullseye
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2029.codfw.wmnet with OS bullseye
* 10:01 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755533{{!}}Undeploy UserMerge (2) (T216089)]] (duration: 00m 49s)
* 10:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]]
* 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2020', diff saved to https://phabricator.wikimedia.org/P19111 and previous config saved to /var/cache/conftool/dbconfig/20220125-100036-marostegui.json
* 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]]
* 09:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:59 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:755532{{!}}Undeploy UserMerge (1) (T216089)]] (duration: 00m 49s)
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19110 and previous config saved to /var/cache/conftool/dbconfig/20220125-095417-marostegui.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19109 and previous config saved to /var/cache/conftool/dbconfig/20220125-095355-marostegui.json
* 09:40 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:40 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 09:40 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir6001.drmrs.wmnet
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19108 and previous config saved to /var/cache/conftool/dbconfig/20220125-093912-marostegui.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19107 and previous config saved to /var/cache/conftool/dbconfig/20220125-093850-marostegui.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19106 and previous config saved to /var/cache/conftool/dbconfig/20220125-093806-marostegui.json
* 09:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:23 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host ncredir6001.drmrs.wmnet
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19105 and previous config saved to /var/cache/conftool/dbconfig/20220125-092346-marostegui.json
* 09:23 dcausse: restarting blazegraph on wdqs1004 (jvm stuck for 1h)
* 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1013.eqiad.wmnet with OS buster
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19104 and previous config saved to /var/cache/conftool/dbconfig/20220125-085228-root.json
* 08:45 moritzm: draining instances off ganeti1005 for reimage
* 08:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1013.eqiad.wmnet with OS buster
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19103 and previous config saved to /var/cache/conftool/dbconfig/20220125-083724-root.json
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:32 jayme: kubernetes staging migrated tainted worker node setup - [[phab:T290967|T290967]]
* 08:32 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
* 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:25 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1013 to master in pc3 [[phab:T299046|T299046]] (duration: 00m 49s)
* 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19102 and previous config saved to /var/cache/conftool/dbconfig/20220125-082326-marostegui.json
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19101 and previous config saved to /var/cache/conftool/dbconfig/20220125-082319-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19100 and previous config saved to /var/cache/conftool/dbconfig/20220125-082220-root.json
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19099 and previous config saved to /var/cache/conftool/dbconfig/20220125-080814-marostegui.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19098 and previous config saved to /var/cache/conftool/dbconfig/20220125-080717-root.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19097 and previous config saved to /var/cache/conftool/dbconfig/20220125-075309-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19096 and previous config saved to /var/cache/conftool/dbconfig/20220125-075213-root.json
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19095 and previous config saved to /var/cache/conftool/dbconfig/20220125-073805-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19094 and previous config saved to /var/cache/conftool/dbconfig/20220125-073709-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19093 and previous config saved to /var/cache/conftool/dbconfig/20220125-073457-marostegui.json
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19092 and previous config saved to /var/cache/conftool/dbconfig/20220125-073450-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19091 and previous config saved to /var/cache/conftool/dbconfig/20220125-072206-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19090 and previous config saved to /var/cache/conftool/dbconfig/20220125-071945-marostegui.json
* 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bullseye
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19089 and previous config saved to /var/cache/conftool/dbconfig/20220125-070702-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19088 and previous config saved to /var/cache/conftool/dbconfig/20220125-070441-marostegui.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19087 and previous config saved to /var/cache/conftool/dbconfig/20220125-065158-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19086 and previous config saved to /var/cache/conftool/dbconfig/20220125-064936-marostegui.json
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bullseye
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19085 and previous config saved to /var/cache/conftool/dbconfig/20220125-064829-marostegui.json
* 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
* 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19084 and previous config saved to /var/cache/conftool/dbconfig/20220125-064801-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19083 and previous config saved to /var/cache/conftool/dbconfig/20220125-063655-root.json
* 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1030.eqiad.wmnet with OS bullseye
* 06:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19082 and previous config saved to /var/cache/conftool/dbconfig/20220125-063256-marostegui.json
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:26 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc3 [[phab:T299046|T299046]] (duration: 00m 49s)
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19081 and previous config saved to /var/cache/conftool/dbconfig/20220125-061751-marostegui.json
* 06:07 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1030.eqiad.wmnet with OS bullseye
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19080 and previous config saved to /var/cache/conftool/dbconfig/20220125-060247-marostegui.json
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1030 [[phab:T299889|T299889]]', diff saved to https://phabricator.wikimedia.org/P19079 and previous config saved to /var/cache/conftool/dbconfig/20220125-060241-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19078 and previous config saved to /var/cache/conftool/dbconfig/20220125-060128-marostegui.json
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:29 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755834{{!}}Lower The Wikipedia Library editcount]] (duration: 00m 49s)
* 00:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:23 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:756585{{!}}Enable wgMinervaEnableSiteNotice for bnwiki (T299529)]] (duration: 00m 49s)
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:14 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:756712{{!}}bgwiki: fix setup for Draft namespace (T299224)]] (duration: 00m 49s)
* 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn


== 2022-01-24 ==
== 2022-08-04 ==
* 23:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 23:07 mutante: switching gerrit-replica.wikimedia.org to new machine gerrit2002, dropping gerrit-replica-new.wikimedia.org [[phab:T313250|T313250]]
* 23:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 21:07 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:29 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: [[gerrit:756720{{!}}Revert "Choose wikiversions.php file relative to MWMultiVersion.php"]] (duration: 00m 49s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:54 ryankemper: [[phab:T280001|T280001]] Removed downtime on `wcqs*`
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS buster
* 20:56 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark (duration: 06m 12s)
* 22:48 ryankemper: [[phab:T280001|T280001]] Moved `wcqs` service state into `production` by merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/756713; running puppet on authdns/alert hosts
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 22:32 inflatador: [[phab:T280001|T280001]] [[phab:T282117|T282117]] Merged https://gerrit.wikimedia.org/r/c
* 20:51 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:50 thcipriani@deploy1002: Started scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark
* 20:48 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:812391]] [config]: Add click event logging for mobile and desktop (duration: 39m 16s)
* 20:45 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:24 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:23 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:22 ryankemper@deploy1002: helmfile [staging] START helmfile.d/


== 2022-01-23 ==
== 2022-08-03 ==
* 22:02 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@37937f6]: (no justification provided) (duration: 00m 08s)
* 23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
* 22:02 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@37937f6]: (no justification provided)
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32270 and previous config saved to /var/cache/conftool/dbconfig/20220803-235030-marostegui.json
* 21:27 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided) (duration: 00m 09s)
* 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32269 and previous config saved to /var/cache/conftool/dbconfig/20220803-225015-marostegui.json
* 21:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided)
* 22:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32268 and previous config saved to /var/cache/conftool/dbconfig/20220803-224827-marostegui.json
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32267 and previous config saved to /var/cache/conftool/dbconfig/20220803-223321-marostegui.json
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32266 and previous config saved to /var/cache/conftool/dbconfig/20220803-221815-marostegui.json
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32265 and previous config saved to /var/cache/conftool/dbconfig/20220803-220309-marostegui.json
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32264 and previous config saved to /var/cache/conftool/dbconfig/20220803-220057-marostegui.json
* 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32263 and previous config saved to /var/cache/conftool/dbconfig/20220803-220007-marostegui.json
* 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32262 and previous config saved to /var/cache/conftool/dbconfig/20220803-214501-marostegui.json
* 21:44 damilare: payments-wiki updated from {{Gerrit|e1b6036a}} to {{Gerrit|712df4ce}}
* 21:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - [[phab:T314078|T314078]]
* 21:35 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 21:35 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 21:30 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 21:30 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32261 and previous config saved to /var/cache/conftool/dbconfig/20220803-212955-marostegui.json
* 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32260 and previous config saved to /var/cache/conftool/dbconfig/20220803-211449-marostegui.json
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32259 and previous config saved to /var/cache/conftool/dbconfig/20220803-211237-marostegui.json
* 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32258 and previous config saved to /var/cache/conftool/dbconfig/20220803-211216-marostegui.json
* 21:03 ejegg: updated standalone SmashPig deployment from {{Gerrit|8e8f0017}} to {{Gerrit|9b97ea15}}
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32257 and previous config saved to /var/cache/conftool/dbconfig/20220803-205710-marostegui.json
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:55 ebernhardson@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: [[gerrit:820223{{!}}cirrus: Set ElasticaWrite partition count for cloudelastic to 3]] (duration: 03m 29s)
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/includes/VisualEditorParsoidClient.php: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]) (duration: 03m 25s)
* 20:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32256 and previous config saved to /var/cache/conftool/dbconfig/20220803-204204-marostegui.json
* 20:39 urbanecm@deploy1002: sync-file aborted: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]ú (duration: 00m 00s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/: {{Gerrit|b840eef86837aed3e566885110e93b2ca9ab5f42}}: Fix ReplyLinksController#teardown (duration: 03m 27s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/: {{Gerrit|70a18f5846111a0dfe8ba473daf384cbb8e88804}}:  Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 13s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/CirrusSearch/: {{Gerrit|9961e9bc8f5873f8ddc8a11108de0a7bfcb14ae6}}: Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 23s)
* 20:28 cwhite@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host logstash2032.codfw.wmnet
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32255 and previous config saved to /var/cache/conftool/dbconfig/20220803-202658-marostegui.json
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32254 and previous config saved to /var/cache/conftool/dbconfig/20220803-202146-marostegui.json
* 20:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 20:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32253 and previous config saved to /var/cache/conftool/dbconfig/20220803-202125-marostegui.json
* 20:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 20:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|195f8090b9694be65c937cea108ff4f6400972ec}}: Start writing to cuc_actor on test wikis ([[phab:T233004|T233004]]) (duration: 03m 27s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2032.codfw.wmnet on all recursors
* 20:08 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2032.codfw.wmnet on all recursors
* 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:07 mutante: gerrit - adding second replica [[phab:T313250|T313250]]
* 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32252 and previous config saved to /var/cache/conftool/dbconfig/20220803-200619-marostegui.json
* 20:04 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 20:03 cwhite@cumin2002: START - Cookbook sre.ganeti.makevm for new host logstash2032.codfw.wmnet
* 20:00 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2012.codfw.wmnet
* 20:00 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2012.codfw.wmnet
* 20:00 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2012.codfw.wmnet
* 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32251 and previous config saved to /var/cache/conftool/dbconfig/20220803-195113-marostegui.json
* 19:40 ryankemper: [[phab:T314078|T314078]] Forgot to mention, restart is at `ryankemper@cumin1001` tmux session `codfw_restarts`
* 19:39 ryankemper: [[phab:T314078|T314078]] Rolling upgrade of codfw hosts; after this all of eqiad/codfw will have the new plugin version and we can resume the `search-loader` instances: `sudo -E cookbook sre.elasticsearch.rolling-operation search_codfw "codfw cluster plugin upgrade" --upgrade --nodes-per-run 3 --start-datetime 2022-08-03T19:38:10 --task-id [[phab:T314078|T314078]]`
* 19:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - [[phab:T314078|T314078]]
* 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32250 and previous config saved to /var/cache/conftool/dbconfig/20220803-193607-marostegui.json
* 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32249 and previous config saved to /var/cache/conftool/dbconfig/20220803-193354-marostegui.json
* 19:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 19:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32248 and previous config saved to /var/cache/conftool/dbconfig/20220803-193334-marostegui.json
* 19:25 mutante: gerrit1001 - rsyncing /var/lib/gerrit/review_site/ over to gerrit2002 815401
* 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32247 and previous config saved to /var/cache/conftool/dbconfig/20220803-191828-marostegui.json
* 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32246 and previous config saved to /var/cache/conftool/dbconfig/20220803-190321-marostegui.json
* 18:56 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2011.codfw.wmnet
* 18:56 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2011.codfw.wmnet
* 18:56 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2011.codfw.wmnet
* 18:33 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2027,2037].codfw.wmnet
* 18:33 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2027,2037].codfw.wmnet
* 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]] (duration: 03m 37s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 17:58 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage2002.codfw.wmnet
* 17:58 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubestage2002.codfw.wmnet
* 17:57 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2025-2026].codfw.wmnet
* 17:57 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2025-2026].codfw.wmnet
* 17:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2044.codfw.wmnet
* 17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2044.codfw.wmnet
* 17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2043.codfw.wmnet
* 17:56 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2043.codfw.wmnet
* 17:55 ottomata: increasing partitions from 5 to 6 for *.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite topics in Kafka main-eqiad and main-codfw - [[phab:T314426|T314426]]
* 17:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2055.codfw.wmnet
* 17:55 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2055.codfw.wmnet
* 17:50 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=kubestage2002.codfw.wmnet
* 17:38 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2008-2010].codfw.wmnet
* 17:38 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2008-2010].codfw.wmnet
* 17:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase20[12]4.codfw.wmnet
* 17:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
* 17:14 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
* 17:08 ryankemper: [[phab:T310145|T310145]] `elastic2031` and `wcqs2002` powered off in preparation for C1 maintenance
* 17:06 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=(kubernetes2020.codfw.wmnet{{!}}kubernetes2009.codfw.wmnet{{!}}kubernetes2010.codfw.wmnet)
* 17:00 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 16:48 Emperor: shutdown  moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,55,68].codfw.wmnet PDU work [[phab:T310145|T310145]]
* 16:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
* 16:47 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
* 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
* 16:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
* 16:46 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
* 16:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2046.codfw.wmnet
* 16:40 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2046.codfw.wmnet
* 16:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 10 hosts
* 16:39 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 10 hosts
* 16:38 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2023.codfw.wmnet
* 16:38 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2023.codfw.wmnet
* 16:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
* 16:37 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
* 16:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
* 16:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
* 16:32 jelto: power off mc2025-2026
* 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for rdb2008.codfw.wmnet
* 16:30 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for rdb2008.codfw.wmnet
* 16:28 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2009-2010,2020].codfw.wmnet
* 16:27 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2009-2010,2020].codfw.wmnet
* 16:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 12 hosts
* 16:11 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for 12 hosts
* 16:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
* 16:08 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 15 hosts
* 16:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs[2005-2008].codfw.wmnet
* 16:08 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs[2005-2008].codfw.wmnet
* 15:59 Emperor: shutdown ms-be20[33,47],thanos-be2002 prior to PDU work [[phab:T310070|T310070]]
* 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
* 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
* 15:52 jelto: pooling mw2259-2270 again
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32242 and previous config saved to /var/cache/conftool/dbconfig/20220803-154515-marostegui.json
* 15:38 vgutierrez: clearing ats-be cache on cp6008 - [[phab:T309651|T309651]]
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:36 elukey: powercycle kafka-logging2003 - not responsive to serial console
* 15:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: {{Gerrit|4438957e78e0012aff646e52dc16a4fb796cfd6b}}: ServiceImageRecommendationProvider: Add extra logging when no JSON response received ([[phab:T313973|T313973]]) (duration: 03m 04s)
* 15:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
* 15:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
* 15:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
* 15:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
* 15:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
* 15:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2024.codfw.wmnet
* 15:30 vgutierrez: clearing ats-be cache on cp6016 - [[phab:T309651|T309651]]
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32241 and previous config saved to /var/cache/conftool/dbconfig/20220803-153009-marostegui.json
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
* 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
* 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
* 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
* 15:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
* 15:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 15:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32240 and previous config saved to /var/cache/conftool/dbconfig/20220803-151502-marostegui.json
* 15:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
* 15:10 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
* 15:04 jelto: power off mc2023
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32239 and previous config saved to /var/cache/conftool/dbconfig/20220803-145956-marostegui.json
* 14:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
* 14:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32238 and previous config saved to /var/cache/conftool/dbconfig/20220803-145849-marostegui.json
* 14:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32237 and previous config saved to /var/cache/conftool/dbconfig/20220803-145828-marostegui.json
* 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:53 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.19 (duration: 05m 37s)
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:47 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.21 (duration: 06m 13s)
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32236 and previous config saved to /var/cache/conftool/dbconfig/20220803-144322-marostegui.json
* 14:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 14:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 14:32 Emperor: shutdown aqs200[5-8] prior to PDU work [[phab:T310070|T310070]]
* 14:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
* 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
* 14:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
* 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
* 14:28 jelto: power off thumbor2003 and thumbor2004
* 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32235 and previous config saved to /var/cache/conftool/dbconfig/20220803-142816-marostegui.json
* 14:27 moritzm: upgrading ganeti/esams to Ganeti 3.0.2 [[phab:T312637|T312637]]
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32234 and previous config saved to /var/cache/conftool/dbconfig/20220803-141310-marostegui.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32233 and previous config saved to /var/cache/conftool/dbconfig/20220803-141103-marostegui.json
* 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32232 and previous config saved to /var/cache/conftool/dbconfig/20220803-141042-marostegui.json
* 14:06 moritzm: installing freetype security updates on bullseye
* 13:57 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin 'P<nowiki>{</nowiki>R:Class = Confd<nowiki>}</nowiki>' 'systemctl restart confd'
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32231 and previous config saved to /var/cache/conftool/dbconfig/20220803-135536-marostegui.json
* 13:46 cdanis: ✔️ cdanis@deploy1002.eqiad.wmnet ~ 🕙☕ sudo systemctl restart confd
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32230 and previous config saved to /var/cache/conftool/dbconfig/20220803-134030-marostegui.json
* 13:30 moritzm: installing Java 8 security updates for Buster
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32229 and previous config saved to /var/cache/conftool/dbconfig/20220803-132524-marostegui.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32228 and previous config saved to /var/cache/conftool/dbconfig/20220803-131916-marostegui.json
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32227 and previous config saved to /var/cache/conftool/dbconfig/20220803-131855-marostegui.json
* 13:18 sukhe: depool codfw for PDU upgrade: CR 819798
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:16 urbanecm@deploy1002: Synchronized wmf-config/MetaContactPages.php: {{Gerrit|f89f02e306a1fa580fa41ba56de978f4208ea672}}: Amend license request contact form per Legal ([[phab:T303359|T303359]]) (duration: 09m 27s)
* 13:12 jbond: introduce puppetmaster[12]004 for now as offline
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
* 13:09 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 13:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 13:04 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32226 and previous config saved to /var/cache/conftool/dbconfig/20220803-130348-marostegui.json
* 12:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 12:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 12:56 pt1979@cumin1001: START - Cookbook sre.dns.netbox
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32224 and previous config saved to /var/cache/conftool/dbconfig/20220803-124842-marostegui.json
* 12:40 moritzm: uploaded openjdk-8 8u342-b07-1~deb10u1  to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security update)
* 12:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 12:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32223 and previous config saved to /var/cache/conftool/dbconfig/20220803-123336-marostegui.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32222 and previous config saved to /var/cache/conftool/dbconfig/20220803-122929-marostegui.json
* 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32221 and previous config saved to /var/cache/conftool/dbconfig/20220803-122819-marostegui.json
* 12:16 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@614f7b2]: (no justification provided) (duration: 00m 11s)
* 12:16 ebysans@deploy1002: Started deploy [airflow-dags/analytics@614f7b2]: (no justification provided)
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32220 and previous config saved to /var/cache/conftool/dbconfig/20220803-121313-marostegui.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32219 and previous config saved to /var/cache/conftool/dbconfig/20220803-115807-marostegui.json
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2176 to s1 [[phab:T311494|T311494]]', diff saved to https://phabricator.wikimedia.org/P32218 and previous config saved to /var/cache/conftool/dbconfig/20220803-115706-marostegui.json
* 11:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, [[phab:T310145|T310145]]
* 11:49 root@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, [[phab:T310145|T310145]]
* 11:46 jayme@cumin1001: conftool action : set/weight=10; selector: name=(kubernetes2019.codfw.wmnet{{!}}kubernetes2021.codfw.wmnet{{!}}kubernetes2022.codfw.wmnet{{!}}kubernetes2018.codfw.wmnet{{!}}kubernetes2020.codfw.wmnet)
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32217 and previous config saved to /var/cache/conftool/dbconfig/20220803-114301-marostegui.json
* 11:41 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=(kubernetes2020.codfw.wmnet{{!}}kubernetes2009.codfw.wmnet{{!}}kubernetes2010.codfw.wmnet{{!}}kubernetes2011.codfw.wmnet{{!}}kubernetes2012.codfw.wmnet{{!}}kubestage2002.codfw.wmnet)
* 11:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase2022.codfw.wmnet
* 11:37 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 11:35 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:32 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 11:26 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wdqs
* 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=kartotherian
* 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-backend
* 11:21 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
* 11:17 _joe_: depooling codfw services from all traffic
* 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2011.codfw.wmnet to cluster codfw and group C
* 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2011.codfw.wmnet to cluster codfw and group C
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
* 10:47 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
* 10:46 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32216 and previous config saved to /var/cache/conftool/dbconfig/20220803-104246-marostegui.json
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32215 and previous config saved to /var/cache/conftool/dbconfig/20220803-104224-marostegui.json
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
* 10:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase201[45].codfw.wmnet
* 10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2022.codfw.wmnet
* 10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
* 10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
* 10:37 jelto: shutdown kubestage2002 kubernetes2020 kubernetes2009 kubernetes2010 kubernetes2011 kubernetes2012
* 10:30 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
* 10:30 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
* 10:29 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 10:29 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32213 and previous config saved to /var/cache/conftool/dbconfig/20220803-102718-marostegui.json
* 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2012.codfw.wmnet
* 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2011.codfw.wmnet
* 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
* 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2009.codfw.wmnet
* 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2020.codfw.wmnet
* 10:20 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubestage2002.codfw.wmnet
* 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
* 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2011.codfw.wmnet with OS bullseye
* 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
* 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32212 and previous config saved to /var/cache/conftool/dbconfig/20220803-101212-marostegui.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32211 and previous config saved to /var/cache/conftool/dbconfig/20220803-095706-marostegui.json
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
* 09:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2021.codfw.wmnet
* 09:56 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2012.codfw.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32210 and previous config saved to /var/cache/conftool/dbconfig/20220803-095559-marostegui.json
* 09:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 09:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32209 and previous config saved to /var/cache/conftool/dbconfig/20220803-095538-marostegui.json
* 09:55 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase2027.codfw.wmnet
* 09:54 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2011.codfw.wmnet
* 09:54 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:54 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
* 09:52 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2010.codfw.wmnet
* 09:50 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2009.codfw.wmnet
* 09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
* 09:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
* 09:47 jelto: kubectl drain --ignore-daemonsets kubernetes2020.codfw.wmnet
* 09:46 jelto: kubectl cordon kubernetes2020.codfw.wmnet kubernetes2009.codfw.wmnet kubernetes2010.codfw.wmnet kubernetes2011.codfw.wmnet kubernetes2012.codfw.wmnet
* 09:43 jelto: kubectl drain --ignore-daemonsets kubestage2002.codfw.wmnet
* 09:43 vgutierrez: rolling restart of pybal in codfw lvs instances - [[phab:T310070|T310070]]
* 09:42 jelto: kubectl cordon kubestage2002
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32208 and previous config saved to /var/cache/conftool/dbconfig/20220803-094032-marostegui.json
* 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS bullseye
* 09:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@674bb8b]: (no justification provided) (duration: 00m 10s)
* 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2090.codfw.wmnet
* 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:33 ebysans@deploy1002: Started deploy [airflow-dags/analytics@674bb8b]: (no justification provided)
* 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:29 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2090.codfw.wmnet
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32207 and previous config saved to /var/cache/conftool/dbconfig/20220803-092525-marostegui.json
* 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:23 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:23 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:22 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 09:22 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2090 from dbctl [[phab:T314109|T314109]]', diff saved to https://phabricator.wikimedia.org/P32206 and previous config saved to /var/cache/conftool/dbconfig/20220803-092053-marostegui.json
* 09:20 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
* 09:15 jelto: power on mc2024
* 09:10 XioNoX: configure BGP on the esams-drmrs link - [[phab:T307221|T307221]]
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32205 and previous config saved to /var/cache/conftool/dbconfig/20220803-091019-marostegui.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32204 and previous config saved to /var/cache/conftool/dbconfig/20220803-090912-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 09:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32203 and previous config saved to /var/cache/conftool/dbconfig/20220803-090836-marostegui.json
* 09:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
* 09:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
* 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
* 09:04 jynus: stop backup2006 backup2009 for [[phab:T310070|T310070]]
* 09:00 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
* 09:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
* 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
* 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
* 08:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
* 08:58 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
* 08:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
* 08:57 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
* 08:57 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
* 08:54 XioNoX: put the esams-drmrs link in service - [[phab:T307221|T307221]]
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32202 and previous config saved to /var/cache/conftool/dbconfig/20220803-085330-marostegui.json
* 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
* 08:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:47 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 08:41 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32201 and previous config saved to /var/cache/conftool/dbconfig/20220803-083824-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32200 and previous config saved to /var/cache/conftool/dbconfig/20220803-082318-marostegui.json
* 08:19 jynus: stop db2098 for [[phab:T310070|T310070]]
* 08:17 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2072.codfw.wmnet
* 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2072.codfw.wmnet
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2072 from dbctl [[phab:T313911|T313911]]', diff saved to https://phabricator.wikimedia.org/P32199 and previous config saved to /var/cache/conftool/dbconfig/20220803-074806-marostegui.json
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32197 and previous config saved to /var/cache/conftool/dbconfig/20220803-072253-marostegui.json
* 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32196 and previous config saved to /var/cache/conftool/dbconfig/20220803-072214-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
* 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
* 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
* 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
* 07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:819227{{!}}CX: Set MT threshold for publishing in Armenian WP to 80% (T313208)]] (duration: 03m 49s)
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32195 and previous config saved to /var/cache/conftool/dbconfig/20220803-070708-marostegui.json
* 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:00 moritzm: draining ganeti2011 [[phab:T311686|T311686]]
* 06:56 godog: grow sda/sdb 3 by 100G on thanos-be2003 - [[phab:T314275|T314275]]
* 06:56 godog: grow sda/sdb 3 by 100G on thanos-be1002 - [[phab:T314275|T314275]]
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32194 and previous config saved to /var/cache/conftool/dbconfig/20220803-065202-marostegui.json
* 06:46 godog: power up centrallog2002 and prometheus2005 - [[phab:T310070|T310070]]
* 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 06:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32193 and previous config saved to /var/cache/conftool/dbconfig/20220803-063656-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32192 and previous config saved to /var/cache/conftool/dbconfig/20220803-063148-marostegui.json
* 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
* 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
* 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32191 and previous config saved to /var/cache/conftool/dbconfig/20220803-063045-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32190 and previous config saved to /var/cache/conftool/dbconfig/20220803-061538-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32189 and previous config saved to /var/cache/conftool/dbconfig/20220803-060032-marostegui.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32188 and previous config saved to /var/cache/conftool/dbconfig/20220803-054526-marostegui.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32187 and previous config saved to /var/cache/conftool/dbconfig/20220803-054106-marostegui.json
* 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance


== 2022-01-22 ==
== 2022-08-02 ==
* 22:38 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:38 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:51 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:51 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:35 elukey: `apt-get clean` on an-test-coord1001 to free some space
* 22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site  /home) again after gerrit2002 was reimaged with buster [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 08:25 elukey: remove the `--debug=true` etcd daemon arg from ml-etcd2002 (only node having it, probably a manual test done in the past) and cleaned up spammy etcd logs to free space
* 22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
* 01:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 01:30 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:27 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb
* 21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:29 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/includes/Sanity/Checker.php: Backport: [[gerrit:819621{{!}}Fix appending of join conds (T312421 T314439)]] (duration: 03m 15s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:27 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - [[phab:T314078|T314078]]
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS buster
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22  refs [[phab:T308076|T308076]]
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 20:50 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 20:38 mutante: re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise [[phab:T313250|T313250]] [[phab:T243027|T243027]] [[phab:T279509|T279509]]
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS buster
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 urbanecm: UTC evening B&C window done
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/HTMLTransformInput.php: {{Gerrit|69e91528a5c6f372af520307dc2f4227b9981442}}: ParsoidHandler: fix page bundle input with no orig HTML (duration: 03m 22s)
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:29 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/ParsoidHandler.php: {{Gerrit|322a960e3777bc01fa8823908340c36e3851a648}}: ParsoidHandler: pass metrics object to HTMLTransformInput (duration: 03m 19s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5fac0aaf8e76a6f8cc3302771eac068e4f866e5f}}: GrowthExperiments: Remove wgGEHomepageTutorialTitle (duration: 03m 26s)
* 20:06 dancy@deploy1002: Finished scap: Backport for [[gerrit:819612]] Revert "Bump wikimedia/parsoid to 0.16.0-a18" (duration: 11m 30s)
* 20:01 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 05s)
* 20:01 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 19:59 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 01s)
* 19:59 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 19:55 dancy@deploy1002: Started scap: Backport for [[gerrit:819612]] Revert "Bump wikimedia/parsoid to 0.16.0-a18"
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-tls
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=varnish-fe
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-tls
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=varnish-fe
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
* 19:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2041,2046].codfw.wmnet
* 19:35 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2041,2046].codfw.wmnet
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-fe2002.codfw.wmnet
* 19:28 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for thanos-fe2002.codfw.wmnet
* 19:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe2010.codfw.wmnet
* 19:26 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe2010.codfw.wmnet
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-tls
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=varnish-fe
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-be
* 19:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
* 19:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-tls
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=varnish-fe
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 19:11 mutante: gerrit1001 - rsyncing /home/ to gerrit2002:/srv/home-gerrit1001.wikimedia.org [[phab:T313250|T313250]]
* 19:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
* 19:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
* 18:55 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]] (duration: 50m 39s)
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:52 ejegg: updated payments-wiki from {{Gerrit|589bb64e}} to {{Gerrit|e1b6036a}} (just i18n changes in extensions)
* 18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - [[phab:T314078|T314078]]
* 18:46 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mc2038.codfw.wmnet with reason: install
* 18:45 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mc2038.codfw.wmnet with reason: install
* 18:41 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2038.codfw.wmnet
* 18:41 rzl@cumin2002: START - Cookbook sre.hosts.remove-downtime for mc2038.codfw.wmnet
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:18 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: install
* 18:18 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: install
* 18:17 rzl@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2038.codfw.wmnet with reason: install
* 18:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2038.codfw.wmnet with reason: install
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
* 18:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:04 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32185 and previous config saved to /var/cache/conftool/dbconfig/20220802-175233-marostegui.json
* 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2159', diff saved to https://phabricator.wikimedia.org/P32184 and previous config saved to /var/cache/conftool/dbconfig/20220802-174311-ladsgroup.json
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32183 and previous config saved to /var/cache/conftool/dbconfig/20220802-173723-marostegui.json
* 17:35 moritzm: installing node-moment security updates
* 17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: [[phab:T310070|T310070]]
* 17:32 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: [[phab:T310070|T310070]]
* 17:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
* 17:25 moritzm: installing fribidi security updates
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32182 and previous config saved to /var/cache/conftool/dbconfig/20220802-172217-marostegui.json
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-tls
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=varnish-fe
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
* 17:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32181 and previous config saved to /var/cache/conftool/dbconfig/20220802-170711-marostegui.json
* 17:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:05 Emperor: ms-be20[31,32,41,46].codfw.wmnet,ms-fe2010.codfw.wmnet,thanos-fe2002.codfw.wmnet downtime for PDU work [[phab:T309957|T309957]]
* 17:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32180 and previous config saved to /var/cache/conftool/dbconfig/20220802-170503-marostegui.json
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 17:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 17:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 17:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32179 and previous config saved to /var/cache/conftool/dbconfig/20220802-170333-marostegui.json
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-tls
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=varnish-fe
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
* 17:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2030,2045,2052].codfw.wmnet
* 17:00 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2030,2045,2052].codfw.wmnet
* 16:57 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1004.eqiad.wmnet
* 16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 16:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32178 and previous config saved to /var/cache/conftool/dbconfig/20220802-164827-marostegui.json
* 16:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:35 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32177 and previous config saved to /var/cache/conftool/dbconfig/20220802-163321-marostegui.json
* 16:29 dancy@mwmaint1002: pull aborted:  (duration: 00m 07s)
* 16:25 rzl: rzl@stat1007:~$ sudo systemctl stop wmde-analytics-daily-early  # wedged, timer will restart it now with max_runtime_seconds
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32176 and previous config saved to /var/cache/conftool/dbconfig/20220802-161815-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32175 and previous config saved to /var/cache/conftool/dbconfig/20220802-161607-marostegui.json
* 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32174 and previous config saved to /var/cache/conftool/dbconfig/20220802-161545-marostegui.json
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1004.eqiad.wmnet on all recursors
* 16:10 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1004.eqiad.wmnet on all recursors
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:05 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1004.eqiad.wmnet
* 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32173 and previous config saved to /var/cache/conftool/dbconfig/20220802-160039-marostegui.json
* 15:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32172 and previous config saved to /var/cache/conftool/dbconfig/20220802-154533-marostegui.json
* 15:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2037.codfw.wmnet
* 15:36 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32171 and previous config saved to /var/cache/conftool/dbconfig/20220802-153027-marostegui.json
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32170 and previous config saved to /var/cache/conftool/dbconfig/20220802-152818-marostegui.json
* 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32169 and previous config saved to /var/cache/conftool/dbconfig/20220802-152740-marostegui.json
* 15:24 moritzm: installing gnupg2 security updates
* 15:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
* 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1004.eqiad.wmnet with OS buster
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32167 and previous config saved to /var/cache/conftool/dbconfig/20220802-151234-marostegui.json
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:08 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
* 15:08 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
* 15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
* 15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
* 14:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 14:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 14:58 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32166 and previous config saved to /var/cache/conftool/dbconfig/20220802-145728-marostegui.json
* 14:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2060.codfw.wmnet with OS bullseye
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
* 14:50 moritzm: uploaded gnupg2 2.1.18-8~deb9u4+wmf1 to stretch-wikimedia
* 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32164 and previous config saved to /var/cache/conftool/dbconfig/20220802-144222-marostegui.json
* 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32163 and previous config saved to /var/cache/conftool/dbconfig/20220802-144013-marostegui.json
* 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32162 and previous config saved to /var/cache/conftool/dbconfig/20220802-143952-marostegui.json
* 14:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetmaster1004.eqiad.wmnet with OS buster
* 14:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
* 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32161 and previous config saved to /var/cache/conftool/dbconfig/20220802-142446-marostegui.json
* 14:23 Emperor: shutdown ms-be20[30,45,52] for PDU work [[phab:T309957|T309957]]
* 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 14:21 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 14:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32160 and previous config saved to /var/cache/conftool/dbconfig/20220802-140940-marostegui.json
* 14:05 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster2004.codfw.wmnet with OS buster
* 14:04 godog: grow sda/sdb 3 by 100G on thanos-be1001 - [[phab:T314275|T314275]]
* 14:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
* 14:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
* 14:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
* 14:01 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-tls
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2032.codfw.wmnet,service=ats-be
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
* 13:56 godog: schedule poweroff for centrallog2002 at 16 utc - [[phab:T310070|T310070]]
* 13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-be
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32159 and previous config saved to /var/cache/conftool/dbconfig/20220802-135435-marostegui.json
* 13:53 godog: depool and poweroff prometheus2005 - [[phab:T310070|T310070]]
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=varnish-fe
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32158 and previous config saved to /var/cache/conftool/dbconfig/20220802-135226-marostegui.json
* 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
* 13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32157 and previous config saved to /var/cache/conftool/dbconfig/20220802-135155-marostegui.json
* 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
* 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-tls
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-be
* 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
* 13:42 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2013.codfw.wmnet with OS bullseye
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754933{{!}}Enable usage tracking for statement for cebwiki (T296384)]] – expected to gradually increase number of wbc_entity_usage and probably recentchanges rows on cebwiki, but not too much, see task for details (duration: 03m 06s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2028.codfw.wmnet with OS bullseye
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32156 and previous config saved to /var/cache/conftool/dbconfig/20220802-133648-marostegui.json
* 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:754937{{!}}Introduce $wmgEntityUsageModifierLimitsStatement (T296384)]] (2/2) (duration: 03m 21s)
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754937{{!}}Introduce $wmgEntityUsageModifierLimitsStatement (T296384)]] (1/2) (duration: 03m 16s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T309957|T309957]]
* 13:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T309957|T309957]]
* 13:27 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster2004.codfw.wmnet with OS buster
* 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
* 13:24 vgutierrez: restarting ATS 9.x instances to apply https://gerrit.wikimedia.org/r/819585 - [[phab:T309651|T309651]]
* 13:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32155 and previous config saved to /var/cache/conftool/dbconfig/20220802-132142-marostegui.json
* 13:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
* 13:19 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a4499e5ac23a0558bed276e2b74134590afc5c95}}:  Revert "testwiki: Add mediawiki.web_ui.interactions stream" ([[phab:T314151|T314151]], [[phab:T311268|T311268]]) (duration: 03m 19s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c2fb8a58d8f62e29a15ebee26198e79e4597d24c}}: Enable RealtimePreview on Group 0 wikis ([[phab:T314150|T314150]]) (duration: 03m 21s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32154 and previous config saved to /var/cache/conftool/dbconfig/20220802-130636-marostegui.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32153 and previous config saved to /var/cache/conftool/dbconfig/20220802-130428-marostegui.json
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32152 and previous config saved to /var/cache/conftool/dbconfig/20220802-130351-marostegui.json
* 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2013.codfw.wmnet with OS bullseye
* 13:00 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32151 and previous config saved to /var/cache/conftool/dbconfig/20220802-124845-marostegui.json
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32150 and previous config saved to /var/cache/conftool/dbconfig/20220802-123338-marostegui.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32149 and previous config saved to /var/cache/conftool/dbconfig/20220802-121832-marostegui.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32148 and previous config saved to /var/cache/conftool/dbconfig/20220802-121624-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:01 marostegui: dbmaint x1@eqiad [[phab:T314087|T314087]]
* 11:57 marostegui: dbmaint s7@eqiad [[phab:T314377|T314377]]
* 11:57 marostegui: dbmaint s3@eqiad [[phab:T314377|T314377]]
* 11:57 marostegui: dbmaint s8@eqiad [[phab:T314377|T314377]]
* 11:55 marostegui: dbmait s8@eqiad [[phab:T314377|T314377]]
* 11:54 marostegui: dbmait s3@eqiad [[phab:T314377|T314377]]
* 11:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 11:48 marostegui: dbmait s7@eqiad [[phab:T314377|T314377]]
* 11:46 marostegui: dbmait s4@eqiad [[phab:T314377|T314377]]
* 11:35 elukey: restart rsyslog on ml-serve1006
* 10:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: [[phab:T312626|T312626]] btullis
* 10:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: [[phab:T312626|T312626]] btullis
* 10:49 godog: grow sda3 by 100G on thanos-be2004 - [[phab:T314275|T314275]]
* 10:42 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 10:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P32147 and previous config saved to /var/cache/conftool/dbconfig/20220802-103318-root.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P32146 and previous config saved to /var/cache/conftool/dbconfig/20220802-101813-root.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2175 to s2 [[phab:T311494|T311494]]', diff saved to https://phabricator.wikimedia.org/P32145 and previous config saved to /var/cache/conftool/dbconfig/20220802-101522-marostegui.json
* 10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1019.eqiad.wmnet with OS bullseye
* 10:05 jynus: shutdown dbprov2002 backup2005 backup2008 [[phab:T310070|T310070]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P32144 and previous config saved to /var/cache/conftool/dbconfig/20220802-100308-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32143 and previous config saved to /var/cache/conftool/dbconfig/20220802-100304-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2079 from dbctl [[phab:T313885|T313885]]', diff saved to https://phabricator.wikimedia.org/P32141 and previous config saved to /var/cache/conftool/dbconfig/20220802-095455-marostegui.json
* 09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
* 09:49 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage