You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(krinkle@deploy1002: Synchronized multiversion/: Ic0dbcba9f60f20a (duration: 03m 31s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply)
Line 1: Line 1:
== 2022-08-02 ==
* 00:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ieaea60a991e5}} (duration: 03m 10s)
* 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:23 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ia3406eba4ab8bb}} (duration: 03m 22s)
* 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
== 2022-08-01 ==
== 2022-08-01 ==
* 23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Id1ce285631f5}}, {{Gerrit|I194d419fbfe}} (duration: 03m 09s)
* 23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:08 moritzm: drain ganeti2028 [[phab:T309957|T309957]]
* 21:03 mutante: gerrit2002 - mkdir /var/lib/gerrit2/review_site {{!}} gerrit1001 - rsyncing /var/lib/gerrit2/review_site/ to gerrit2002 [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 21:01 urbanecm: UTC late backport window done
* 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|461e0709a8987b110f669b74afc38c706b616e5d}}: itwiki: Change robot policy on NS2 and NS3 ([[phab:T314165|T314165]]) (duration: 03m 18s)
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:57 mutante: phab1001 - rsyncing repo data /srv/repos/ to phab2002 (in addition to phab1004 previously) [[phab:T313360|T313360]]
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:55 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mnwwiktionary  --fix # [[phab:T314023|T314023]]
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba8c17759b7e737a6757792ad4136ff3af00030c}}: mnwwiktionary: Create Appendix namespace ([[phab:T314023|T314023]]) (duration: 03m 09s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateArticleCount.php --wiki=viwikibooks --update # [[phab:T314239|T314239]]
* 20:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c19c3e36ab}}: DiscussionTools: Make new reply buttons available at mediawiki.org ([[phab:T314076|T314076]]); {{Gerrit|24db016c4}}: viwikibooks: Change wgArticleCountMethod to any ([[phab:T314239|T314239]]) (duration: 03m 10s)
* 20:35 daniel@deploy1002: Synchronized php-1.39.0-wmf.22/includes/Rest/Handler: Fix: [[gerrit:819129{{!}}Parsoid REST handler: allow pagebundle input without original HTML.]] (duration: 03m 15s)
* 20:25 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ne.svg ([[phab:T311700|T311700]])
* 20:21 daniel@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ne.svg: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 17s)
* 20:17 daniel@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 32s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2054.codfw.wmnet with OS bullseye
* 19:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 19:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2054.codfw.wmnet with OS bullseye
* 18:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2031.codfw.wmnet with OS bullseye
* 18:44 mutante: gitlab - moved data_persistence group to new parent, under /repos/
* 18:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 18:32 mutante: gitlab - created group 'data_persistence' - added Ladsgroup and upgraded from member to maintainer
* 18:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 18:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2031.codfw.wmnet with OS bullseye
* 17:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2025.codfw.wmnet with OS bullseye
* 17:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 17:31 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 17:18 ryankemper: [[phab:T289135|T289135]] [[phab:T314078|T314078]] Manually reimaging remaining codfw stretch hosts (`elastic[2025,2031,2054,2059-2060]`) to bullseye, one host at a time, waiting for green cluster status to return between each run. `ryankemper@cumin1001` tmux session `codfw_reimage`
* 17:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2025.codfw.wmnet with OS bullseye
* 17:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 17:08 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 17:06 mutante: alert1001 - systemctl restart nsca - pinged by fundraising tech because fundraising hosts have the "passive check is awol" issue again ([[phab:T196336|T196336]])
* 16:25 moritzm: installing tcpdump updates from bullseye point release
* 16:23 cwhite@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1018.eqiad.wmnet with OS bullseye
* 16:10 cwhite@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1018.eqiad.wmnet with OS bullseye
* 15:39 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:29 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (2/2, should be a no-op) (duration: 03m 30s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (1/2, should be a no-op) (duration: 03m 15s)
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:53 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:42 moritzm: installing openjdk-11 security updates
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:38 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:34 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:28 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/skins/Vector/: {{Gerrit|b5007c5f1c389deb344c5bb99e950b4190436cab}}: Revert "styles: Unify on standard external link icon"" (duration: 03m 16s)
* 14:12 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:12 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2044.codfw.wmnet with OS bullseye
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 2/2) (duration: 03m 17s)
* 14:04 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/srwikisource<nowiki>{</nowiki>.png;-1.5x.png;-2x.png<nowiki>}</nowiki> ([[phab:T310961|T310961]])
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: srwikisource: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 1/2) (duration: 03m 41s)
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 urbanecm: UTC afternoon backport window is going to overflow by a couple of minutes
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 13:44 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 13:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2044.codfw.wmnet with OS bullseye
* 13:22 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 11:50 moritzm: installing openjdk-8 security updates for stretch
* 11:43 moritzm: uploaded openjdk-8 8u342-b07-1~deb9u1 for stretch-wikimedia
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32124 and previous config saved to /var/cache/conftool/dbconfig/20220801-102714-ladsgroup.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32123 and previous config saved to /var/cache/conftool/dbconfig/20220801-101208-ladsgroup.json
* 10:09 vgutierrez: test ATS 9.1.2 on cp6016 - [[phab:T309651|T309651]]
* 10:05 vgutierrez: test ATS 9.1.2 on cp6008 - [[phab:T309651|T309651]]
* 10:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@4da9195]: (no justification provided) (duration: 00m 19s)
* 10:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@4da9195]: (no justification provided)
* 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32122 and previous config saved to /var/cache/conftool/dbconfig/20220801-095702-ladsgroup.json
* 09:56 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 05s)
* 09:56 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
* 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32121 and previous config saved to /var/cache/conftool/dbconfig/20220801-094156-ladsgroup.json
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32120 and previous config saved to /var/cache/conftool/dbconfig/20220801-093845-ladsgroup.json
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Maintenance
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 09:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
* 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
* 09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
* 09:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
* 09:00 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:53 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/includes/api: Backport: [[gerrit:818562{{!}}api: Support for links migration in ApiQueryBacklinks (T312865 T314112)]] (duration: 03m 01s)
* 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:50 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
* 08:48 godog: thanos-be2004: copy quarantined and tmp off sdb3 and into sdb4 for analysis and to free space - [[phab:T314275|T314275]]
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818998{{!}}Stop writing to the old templatelinks columns in itwikisource (T312865)]] (duration: 03m 12s)
* 08:43 vgutierrez: rolling upgrade of HAProxy to version 2.4.18
* 08:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
* 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 08:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
* 08:14 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
* 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 06:14 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appservers-ro
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appserver-ro
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=(appserver{{!}}api)-ro
* 05:43 moritzm: installing Linux 5.10.127-2 on Gitlab runners
* 01:00 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ic0dbcba9f60f20a}} (duration: 03m 31s)
* 01:00 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ic0dbcba9f60f20a}} (duration: 03m 31s)
* 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
Line 10: Line 189:
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
==Archives ==
== 2022-07-31 ==
* 23:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:20 krinkle@deploy1002: Synchronized dblists-index.php: {{Gerrit|I814ee93b5c}} (duration: 03m 20s)
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:19 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp5001.eqsin.wmnet
* 18:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 18:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 18:12 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet,service=ats-tls
* 18:12 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet,service=varnish-fe
* 18:12 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet,service=ats-be
 
== 2022-07-30 ==
* 01:44 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 01:44 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2028.codfw.wmnet with OS bullseye
* 00:55 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye
 
== 2022-07-29 ==
* 22:43 Krinkle: krinkle@mwmaint1002$ mwscript findBadBlobs.php nlwiktionary; mark 2371 blobs from May 2004 as "Invalid gzip, [[phab:T265989|T265989]]"
* 22:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2041.codfw.wmnet with OS bullseye
* 22:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2041.codfw.wmnet with reason: host reimage
* 22:17 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2041.codfw.wmnet with reason: host reimage
* 22:09 Krinkle: findBadBlobs.php nlwiktionary --revisions 22 --mark 'Invalid gzip, [[phab:T265989|T265989]]'
* 22:01 mutante: phab1001 - rsync -avp --bwlimit=1000 /srv/repos/ rsync://phab1004.eqiad.wmnet/phabricator-srv-repos (running slowly inside a screen session as root)  ([[phab:T313360|T313360]], [[phab:T280597|T280597]])
* 21:57 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2041.codfw.wmnet with OS bullseye
* 21:06 mutante: phab1004 - mkdir /srv/repos ; mkdir /srv/dumps
* 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2029.codfw.wmnet with OS bullseye
* 20:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2029.codfw.wmnet with reason: host reimage
* 20:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2029.codfw.wmnet with reason: host reimage
* 20:18 mutante: authdns-update - adding gerrit-replica-new.wikimedia.org
* 20:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2029.codfw.wmnet with OS bullseye
* 18:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2057.codfw.wmnet with OS bullseye
* 18:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2057.codfw.wmnet with reason: host reimage
* 18:02 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2057.codfw.wmnet with reason: host reimage
* 17:47 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2057.codfw.wmnet with OS bullseye
* 17:41 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 09s)
* 17:41 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
* 17:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2042.codfw.wmnet with OS bullseye
* 16:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2042.codfw.wmnet with reason: host reimage
* 16:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2042.codfw.wmnet with reason: host reimage
* 16:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2042.codfw.wmnet with OS bullseye
* 16:21 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2058.codfw.wmnet with OS bullseye
* 15:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2058.codfw.wmnet with reason: host reimage
* 15:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2058.codfw.wmnet with reason: host reimage
* 15:40 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2058.codfw.wmnet with OS bullseye
* 15:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2030.codfw.wmnet with OS bullseye
* 15:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2030.codfw.wmnet with reason: host reimage
* 15:17 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2030.codfw.wmnet with reason: host reimage
* 15:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2030.codfw.wmnet with OS bullseye
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P32112 and previous config saved to /var/cache/conftool/dbconfig/20220729-150256-root.json
* 15:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2043.codfw.wmnet with OS bullseye
* 14:59 marostegui: dbmaint s7@eqiad [[phab:T314140|T314140]]
* 14:39 marostegui: dbmaint s3@eqiad [[phab:T314140|T314140]]
* 14:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: host reimage
* 14:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: host reimage
* 14:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1189.eqiad.wmnet with OS bullseye
* 14:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS bullseye
* 14:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1186.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1187.eqiad.wmnet with OS bullseye
* 14:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
* 14:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2043.codfw.wmnet with OS bullseye
* 14:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
* 14:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
* 14:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
* 14:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
* 14:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
* 14:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
* 14:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS bullseye
* 14:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS bullseye
* 14:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1187.eqiad.wmnet with OS bullseye
* 14:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
* 14:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bullseye
* 14:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1185.eqiad.wmnet with OS bullseye
* 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
* 13:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2047.codfw.wmnet with OS bullseye
* 13:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2047.codfw.wmnet with reason: host reimage
* 13:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2047.codfw.wmnet with reason: host reimage
* 13:12 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2047.codfw.wmnet with OS bullseye
* 13:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 13:07 marostegui: dbmaint s8@eqiad [[phab:T314140|T314140]]
* 13:07 marostegui: dbmaint s4@eqiad [[phab:T314140|T314140]]
* 13:07 marostegui: dbmaint s4@eqiad T314141T314140
* 13:06 marostegui: dbmaint s3@eqiad [[phab:T314141|T314141]]
* 12:11 marostegui: dbmaint s3@eqiad [[phab:T314087|T314087]]
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2088 from dbctl [[phab:T313797|T313797]]', diff saved to https://phabricator.wikimedia.org/P32111 and previous config saved to /var/cache/conftool/dbconfig/20220729-114203-marostegui.json
* 11:37 vgutierrez: update ATS to version 9.1.2 in cp4032 - [[phab:T309651|T309651]]
* 11:04 vgutierrez: reenable puppet on cp nodes
* 11:03 vgutierrez: repool ats-be@cp4026 - [[phab:T309651|T309651]]
* 10:33 vgutierrez: disable puppet on cp nodes to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/818436
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2173 into s1 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P32110 and previous config saved to /var/cache/conftool/dbconfig/20220729-101507-marostegui.json
* 08:12 vgutierrez: depool ats-be on cp4026 for debugging purposes
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32109 and previous config saved to /var/cache/conftool/dbconfig/20220729-080528-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32108 and previous config saved to /var/cache/conftool/dbconfig/20220729-075023-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32107 and previous config saved to /var/cache/conftool/dbconfig/20220729-073518-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32106 and previous config saved to /var/cache/conftool/dbconfig/20220729-072013-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32105 and previous config saved to /var/cache/conftool/dbconfig/20220729-070509-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32104 and previous config saved to /var/cache/conftool/dbconfig/20220729-065004-root.json
* 05:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: codfw s8 sanitarium master switch
* 05:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 16 hosts with reason: codfw s8 sanitarium master switch
* 00:48 TimStarling: slowly restarting (with batch 1 sleep 5) trafficserver on text caches to fully deploy g 817086 [[phab:T313578|T313578]]
 
== 2022-07-28 ==
* 22:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@9ea9cd1]: (no justification provided) (duration: 00m 09s)
* 22:21 mforns@deploy1002: Started deploy [airflow-dags/analytics@9ea9cd1]: (no justification provided)
* 21:51 mforns@deploy1002: Finished deploy [airflow-dags/analytics@e8d4704]: (no justification provided) (duration: 00m 09s)
* 21:51 mforns@deploy1002: Started deploy [airflow-dags/analytics@e8d4704]: (no justification provided)
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32102 and previous config saved to /var/cache/conftool/dbconfig/20220728-212227-marostegui.json
* 21:18 mforns@deploy1002: Finished deploy [airflow-dags/analytics@5ec2435]: (no justification provided) (duration: 00m 09s)
* 21:18 mforns@deploy1002: Started deploy [airflow-dags/analytics@5ec2435]: (no justification provided)
* 21:07 brennen@deploy1002: Finished deploy [phabricator/deployment@a0f0699]: test deploy to phab2001 (take 2) (duration: 00m 27s)
* 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32100 and previous config saved to /var/cache/conftool/dbconfig/20220728-210721-marostegui.json
* 21:06 brennen@deploy1002: Started deploy [phabricator/deployment@a0f0699]: test deploy to phab2001 (take 2)
* 21:04 brennen@deploy1002: Finished deploy [phabricator/deployment@a21dea9]: test deploy to phab2001 (duration: 00m 27s)
* 21:03 brennen@deploy1002: Started deploy [phabricator/deployment@a21dea9]: test deploy to phab2001
* 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32099 and previous config saved to /var/cache/conftool/dbconfig/20220728-205215-marostegui.json
* 20:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32098 and previous config saved to /var/cache/conftool/dbconfig/20220728-203709-marostegui.json
* 20:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32097 and previous config saved to /var/cache/conftool/dbconfig/20220728-203446-marostegui.json
* 20:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 20:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 20:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 16 hosts with reason: Maintenance
* 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 16 hosts with reason: Maintenance
* 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 20:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 20:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 20:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32096 and previous config saved to /var/cache/conftool/dbconfig/20220728-203212-marostegui.json
* 20:18 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817263{{!}}Register Wikistories streams (T313633)]] (duration: 03m 24s)
* 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P32095 and previous config saved to /var/cache/conftool/dbconfig/20220728-201706-marostegui.json
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P32094 and previous config saved to /var/cache/conftool/dbconfig/20220728-200200-marostegui.json
* 19:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32093 and previous config saved to /var/cache/conftool/dbconfig/20220728-194654-marostegui.json
* 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32092 and previous config saved to /var/cache/conftool/dbconfig/20220728-194426-marostegui.json
* 19:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 19:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32091 and previous config saved to /var/cache/conftool/dbconfig/20220728-194405-marostegui.json
* 19:44 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.22  refs [[phab:T308075|T308075]]
* 19:35 brennen: 1.39.0-wmf.22 train ([[phab:T308075|T308075]]): blocker resolved, rolling to all wikis
* 19:34 brennen@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/Flow: Backport: [[gerrit:818154{{!}}Update CheckUser hook for pagination (T314058 T314069)]] (duration: 03m 16s)
* 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32090 and previous config saved to /var/cache/conftool/dbconfig/20220728-192859-marostegui.json
* 19:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32089 and previous config saved to /var/cache/conftool/dbconfig/20220728-191353-marostegui.json
* 19:08 wfan: civicrm upgraded from {{Gerrit|3143dda9}} to {{Gerrit|497bddf7}}
* 19:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@82e0383]: (no justification provided) (duration: 00m 17s)
* 19:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@82e0383]: (no justification provided)
* 18:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32088 and previous config saved to /var/cache/conftool/dbconfig/20220728-185847-marostegui.json
* 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32087 and previous config saved to /var/cache/conftool/dbconfig/20220728-185624-marostegui.json
* 18:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 18:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32086 and previous config saved to /var/cache/conftool/dbconfig/20220728-185603-marostegui.json
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32085 and previous config saved to /var/cache/conftool/dbconfig/20220728-184056-marostegui.json
* 18:28 mutante: gerrit: rsyncing /home from prod gerrit1001 to /srv/home-gerrit1001.wikimedia.org on  gerrit2002 new replica [[phab:T243027|T243027]] [[phab:T313250|T313250]]
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32084 and previous config saved to /var/cache/conftool/dbconfig/20220728-182550-marostegui.json
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32083 and previous config saved to /var/cache/conftool/dbconfig/20220728-181044-marostegui.json
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32082 and previous config saved to /var/cache/conftool/dbconfig/20220728-180815-marostegui.json
* 18:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 18:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 18:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32081 and previous config saved to /var/cache/conftool/dbconfig/20220728-180754-marostegui.json
* 18:06 ryankemper: [Elastic] Finished re-running `delete`s and `update`s from `2022-07-28T15:00:00Z` until `2022-07-28T17:30:00Z`
* 18:06 damilare: SmashPig updated from {{Gerrit|ffe5066d}} to {{Gerrit|8e8f0017}}
* 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32080 and previous config saved to /var/cache/conftool/dbconfig/20220728-175248-marostegui.json
* 17:41 ryankemper: [Elastic] Re-running `delete`s and `update`s from `2022-07-28T15:00:00Z` until `2022-07-28T17:30:00Z` on `ryankemper@mwmaint1002` tmux `mlr_outage`
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32079 and previous config saved to /var/cache/conftool/dbconfig/20220728-173742-marostegui.json
* 17:23 ryankemper: [Elastic] Restarting `elastic1072` after halting mjolnir bulk daemons: `ryankemper@elastic1072:~$ sudo depool && sleep 30 && sudo systemctl restart elasticsearch_6* && sleep 30 && sudo pool`
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32078 and previous config saved to /var/cache/conftool/dbconfig/20220728-172235-marostegui.json
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32077 and previous config saved to /var/cache/conftool/dbconfig/20220728-172008-marostegui.json
* 17:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 17:19 ryankemper: [Elastic] `ryankemper@search-loader2001:~$ sudo disable-puppet "production issue" && sudo systemctl stop mjolnir-kafka-bulk-daemon.service` just to be safe (we prob only needed to halt eqiad)
* 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32076 and previous config saved to /var/cache/conftool/dbconfig/20220728-171930-marostegui.json
* 17:18 ryankemper: [Elastic] `sudo disable-puppet "production issue"` && `sudo systemctl stop mjolnir-kafka-bulk-daemon.service` on `ryankemper@search-loader1001`
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32075 and previous config saved to /var/cache/conftool/dbconfig/20220728-170424-marostegui.json
* 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32074 and previous config saved to /var/cache/conftool/dbconfig/20220728-164918-marostegui.json
* 16:45 vgutierrez: pooling ats-be@cp4026 running ATS 9.1.2 - [[phab:T309651|T309651]]
* 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:42 mutante: disabling puppet on gerrit servers for a change in gerrit puppet code
* 16:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32073 and previous config saved to /var/cache/conftool/dbconfig/20220728-163412-marostegui.json
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32072 and previous config saved to /var/cache/conftool/dbconfig/20220728-163149-marostegui.json
* 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32071 and previous config saved to /var/cache/conftool/dbconfig/20220728-163127-marostegui.json
* 16:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1056.eqiad.wmnet
* 16:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts conf[1004-1006].eqiad.wmnet
* 16:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 16:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 16:21 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 16:21 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 16:21 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 16:21 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32070 and previous config saved to /var/cache/conftool/dbconfig/20220728-161621-marostegui.json
* 16:15 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1056.eqiad.wmnet
* 16:12 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync
* 16:11 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: sync
* 16:11 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32069 and previous config saved to /var/cache/conftool/dbconfig/20220728-160113-marostegui.json
* 15:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
* 15:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32068 and previous config saved to /var/cache/conftool/dbconfig/20220728-154607-marostegui.json
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32067 and previous config saved to /var/cache/conftool/dbconfig/20220728-154344-marostegui.json
* 15:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 15:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32066 and previous config saved to /var/cache/conftool/dbconfig/20220728-154323-marostegui.json
* 15:38 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4026.ulsfo.wmnet,service=ats-be
* 15:37 sukhe: depool ats-be on cp4026 for ATS9 testing
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32063 and previous config saved to /var/cache/conftool/dbconfig/20220728-152817-marostegui.json
* 15:22 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 15:17 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[1004-1006].eqiad.wmnet
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32062 and previous config saved to /var/cache/conftool/dbconfig/20220728-151311-marostegui.json
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32061 and previous config saved to /var/cache/conftool/dbconfig/20220728-145805-marostegui.json
* 14:46 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32057 and previous config saved to /var/cache/conftool/dbconfig/20220728-141736-marostegui.json
* 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32056 and previous config saved to /var/cache/conftool/dbconfig/20220728-141715-marostegui.json
* 14:02 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@137a4ff]: (no justification provided) (duration: 02m 03s)
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32055 and previous config saved to /var/cache/conftool/dbconfig/20220728-140209-marostegui.json
* 14:00 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@137a4ff]: (no justification provided)
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32054 and previous config saved to /var/cache/conftool/dbconfig/20220728-134828-root.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32053 and previous config saved to /var/cache/conftool/dbconfig/20220728-134703-marostegui.json
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32052 and previous config saved to /var/cache/conftool/dbconfig/20220728-133323-root.json
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32051 and previous config saved to /var/cache/conftool/dbconfig/20220728-133157-marostegui.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32050 and previous config saved to /var/cache/conftool/dbconfig/20220728-132929-marostegui.json
* 13:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32049 and previous config saved to /var/cache/conftool/dbconfig/20220728-132835-marostegui.json
* 13:27 Lucas_WMDE: UTC afternoon backport+config window done
* 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:817225{{!}}testwiki: Add mediawiki.web_ui.interactions stream (T311268)]] (2/2) (duration: 03m 19s)
* 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817225{{!}}testwiki: Add mediawiki.web_ui.interactions stream (T311268)]] (1/2) (duration: 03m 24s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32048 and previous config saved to /var/cache/conftool/dbconfig/20220728-131818-root.json
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32047 and previous config saved to /var/cache/conftool/dbconfig/20220728-131329-marostegui.json
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806931{{!}}Configure wbsearchentities profile parameter on Wikidata (T307869)]] (duration: 03m 25s)
* 13:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32045 and previous config saved to /var/cache/conftool/dbconfig/20220728-130314-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32044 and previous config saved to /var/cache/conftool/dbconfig/20220728-125823-marostegui.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2174 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P32043 and previous config saved to /var/cache/conftool/dbconfig/20220728-125253-marostegui.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32042 and previous config saved to /var/cache/conftool/dbconfig/20220728-124809-root.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32041 and previous config saved to /var/cache/conftool/dbconfig/20220728-124317-marostegui.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32040 and previous config saved to /var/cache/conftool/dbconfig/20220728-123854-marostegui.json
* 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32039 and previous config saved to /var/cache/conftool/dbconfig/20220728-123304-root.json
* 11:50 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test 818085 - jbond@cumin2002"
* 11:50 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test 818085 - jbond@cumin2002"
* 11:41 akosiaris: slow (10minutes interval) rolling restart of all pybals to pick up new conf hosts config. [[phab:T311407|T311407]]
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32038 and previous config saved to /var/cache/conftool/dbconfig/20220728-113615-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32037 and previous config saved to /var/cache/conftool/dbconfig/20220728-112109-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32036 and previous config saved to /var/cache/conftool/dbconfig/20220728-110604-root.json
* 10:53 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32035 and previous config saved to /var/cache/conftool/dbconfig/20220728-105100-root.json
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32034 and previous config saved to /var/cache/conftool/dbconfig/20220728-103555-root.json
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32032 and previous config saved to /var/cache/conftool/dbconfig/20220728-102051-root.json
* 10:19 jbond@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin2002"
* 10:19 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin2002"
* 10:13 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin2002"
* 10:12 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin2002"
* 10:05 jelto: update gitlab1004 to 15.0.4-ce.0
* 09:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 09:48 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:40 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 09:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 09:24 Emperor: rolling restart of swift proxies to apply wmf/rewrite update [[phab:T313102|T313102]]
* 09:17 Emperor: set thanos ring replicas to 3.95 [[phab:T311690|T311690]]
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2142', diff saved to https://phabricator.wikimedia.org/P32030 and previous config saved to /var/cache/conftool/dbconfig/20220728-085737-marostegui.json
* 08:57 kart_: Updated cxserver to 2022-07-27-220330-production ([[phab:T308248|T308248]])
* 08:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 08:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 08:53 vgutierrez: disable puppet on cp hosts to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/816206
* 08:48 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 08:48 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 08:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 08:43 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 08:36 vgutierrez: update HAProxy to version 2.4.18 in cp4021 and cp4027
* 08:28 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2172 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P32028 and previous config saved to /var/cache/conftool/dbconfig/20220728-081252-marostegui.json
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:02 jnuche: UTC morning backport and config training done
* 08:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:01 jnuche: UTC morning backport and config training
* 08:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:44 vgutierrez: update HAProxy to version 2.4.18 on apt.wm.o thirdparty/haproxy24
* 07:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:21 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817758{{!}}Enable SectionTranslation on 10 more WPs where ContentTranslation is available by default (T313300)]] (duration: 03m 16s)
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2142 [[phab:T313811|T313811]]', diff saved to https://phabricator.wikimedia.org/P32026 and previous config saved to /var/cache/conftool/dbconfig/20220728-060757-root.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary [[phab:T313811|T313811]]', diff saved to https://phabricator.wikimedia.org/P32025 and previous config saved to /var/cache/conftool/dbconfig/20220728-060057-marostegui.json
* 06:00 marostegui: Starting x2 codfw failover from db2142 to db2144 - [[phab:T313811|T313811]]
* 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 [[phab:T313811|T313811]]
* 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 [[phab:T313811|T313811]]
* 03:28 ejegg: updated fundraising CiviCRM from {{Gerrit|e0962be6}} to {{Gerrit|3143dda9}}
* 01:28 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: move OAuth token storage [[phab:T313578|T313578]] (duration: 03m 04s)
* 01:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:18 tstarling@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/OAuth: New config var for [[phab:T313578|T313578]], not yet used (duration: 03m 23s)
* 01:11 tstarling@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/OAuth: New config var for [[phab:T313578|T313578]], not yet used (duration: 03m 39s)
 
== 2022-07-27 ==
* 23:59 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: sync again now that scap proxy list is fixed [[phab:T313730|T313730]] [[phab:T313496|T313496]] (duration: 03m 25s)
* 23:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:45 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: move CentralAuth sessions to Kask [[phab:T313496|T313496]] (duration: 05m 34s)
* 23:45 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2251-2255,2257-2258].codfw.wmnet
* 23:45 rzl@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:38 rzl@cumin2002: START - Cookbook sre.dns.netbox
* 23:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: increase wgObjectCacheSessionExpiry to 86400 (duration: 03m 30s)
* 23:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:26 rzl@cumin2002: START - Cookbook sre.hosts.decommission for hosts mw[2251-2255,2257-2258].codfw.wmnet
* 23:18 rzl@cumin2002: conftool action : set/pooled=inactive; selector: name=mw225[1-57-8].codfw.wmnet
* 23:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Decom
* 23:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: Decom
* 23:14 tstarling@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 23:14 tstarling@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 23:14 tstarling@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
* 23:13 tstarling@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
* 23:13 rzl@cumin2002: conftool action : set/pooled=no; selector: name=mw225[1-57-8].codfw.wmnet
* 23:08 tstarling@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 23:08 tstarling@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:08 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.22  refs [[phab:T308075|T308075]] (duration: 03m 08s)
* 22:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.22  refs [[phab:T308075|T308075]]
* 22:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:59 brennen@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/Translate/src/TtmServer: Backport: [[gerrit:817855{{!}}SearchTranslationsApi: Change the way we fetch TTM services (T313836)]] (duration: 03m 19s)
* 21:33 cjming: end of UTC late backport window
* 21:32 cjming@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/TemplateWizard/resources/ext.TemplateWizard.Dialog.js: Backport: [[gerrit:817851{{!}}Delay template insertion until after closing the dialog (T33780)]] (duration: 03m 36s)
* 21:28 cjming@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/TemplateWizard/resources/ext.TemplateWizard.Dialog.js: Backport: [[gerrit:817850{{!}}Delay template insertion until after closing the dialog (T33780)]] (duration: 03m 27s)
* 21:27 urandom: Removing reserved space on sessionstore storage volumes -- [[phab:T313991|T313991]]
* 21:25 cjming@deploy1002: Synchronized php-1.39.0-wmf.22/resources/src/jquery/jquery.textSelection.js: Backport: [[gerrit:817849{{!}}jquery.textSelection: Use non-execCommand when we can't focus the field (T33780)]] (duration: 03m 22s)
* 21:21 cjming@deploy1002: Synchronized php-1.39.0-wmf.21/resources/src/jquery/jquery.textSelection.js: Backport: [[gerrit:817848{{!}}jquery.textSelection: Use non-execCommand when we can't focus the field (T33780)]] (duration: 03m 09s)
* 21:17 cjming@deploy1002: Synchronized php-1.39.0-wmf.22/resources/src/jquery/jquery.textSelection.js: Backport: [[gerrit:817847{{!}}jquery.textSelection: Support more edge cases of document.execCommand (T33780)]] (duration: 03m 10s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:55 sukhe@cumin1001: dbctl commit (dc=all): 'depool db1111', diff saved to https://phabricator.wikimedia.org/P32018 and previous config saved to /var/cache/conftool/dbconfig/20220727-205536-sukhe.json
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 sukhe@cumin1001: dbctl commit (dc=all): 'depool db1132', diff saved to https://phabricator.wikimedia.org/P32017 and previous config saved to /var/cache/conftool/dbconfig/20220727-204806-sukhe.json
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817893{{!}}VisualEditor: Allow external link paste on mediawikiwiki, metawiki (T129546)]] (duration: 03m 37s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817373{{!}}ptwiki: Restrict "move" permission (T313802)]] (duration: 03m 19s)
* 19:34 denisse@deploy1002: Finished deploy [librenms/librenms@f049593]: Provision LibreNMS on netmon1003 (duration: 00m 05s)
* 19:34 denisse@deploy1002: Started deploy [librenms/librenms@f049593]: Provision LibreNMS on netmon1003
* 19:16 ejegg: updated Fundraising CiviCRM from {{Gerrit|b4a7154a}} to {{Gerrit|e0962be6}}
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32015 and previous config saved to /var/cache/conftool/dbconfig/20220727-175414-marostegui.json
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32014 and previous config saved to /var/cache/conftool/dbconfig/20220727-173908-marostegui.json
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32013 and previous config saved to /var/cache/conftool/dbconfig/20220727-172402-marostegui.json
* 17:23 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1a72195]: switch image_suggestions_manual from _delta to _full (duration: 02m 01s)
* 17:21 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1a72195]: switch image_suggestions_manual from _delta to _full
* 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32012 and previous config saved to /var/cache/conftool/dbconfig/20220727-170856-marostegui.json
* 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32011 and previous config saved to /var/cache/conftool/dbconfig/20220727-164425-marostegui.json
* 16:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 16:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 16:42 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
* 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
* 16:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32010 and previous config saved to /var/cache/conftool/dbconfig/20220727-163935-marostegui.json
* 16:34 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:32 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:32 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 16:31 urandom: rolling Cassandra restart, aqs1010-1015, to restore on-disk logging -- [[phab:T309896|T309896]]
* 16:31 andrewbogott: this is a sample log, demonstrating to dhinus
* 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32009 and previous config saved to /var/cache/conftool/dbconfig/20220727-162429-marostegui.json
* 16:22 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 16:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32008 and previous config saved to /var/cache/conftool/dbconfig/20220727-160923-marostegui.json
* 16:07 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32007 and previous config saved to /var/cache/conftool/dbconfig/20220727-155417-marostegui.json
* 15:51 urandom: rolling Cassandra restart, aqs2001-2012, to restore on-disk logging -- [[phab:T309896|T309896]]
* 15:48 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:46 urandom: restarting Cassandra, sessionstore2001, to restore on-disk logging -- [[phab:T309896|T309896]]
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32006 and previous config saved to /var/cache/conftool/dbconfig/20220727-145646-marostegui.json
* 14:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32005 and previous config saved to /var/cache/conftool/dbconfig/20220727-145626-marostegui.json
* 14:51 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:51 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32003 and previous config saved to /var/cache/conftool/dbconfig/20220727-144120-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32002 and previous config saved to /var/cache/conftool/dbconfig/20220727-142614-marostegui.json
* 14:23 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:22 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:16 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
* 14:16 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32001 and previous config saved to /var/cache/conftool/dbconfig/20220727-141108-marostegui.json
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P32000 and previous config saved to /var/cache/conftool/dbconfig/20220727-140544-marostegui.json
* 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31999 and previous config saved to /var/cache/conftool/dbconfig/20220727-140523-marostegui.json
* 13:51 Lucas_WMDE: UTC afternoon backport+config window done
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikidata.php: Config: [[gerrit:817317{{!}}Tune the wikidata "language" profile for wbsearchentities (T307869)]] (2/2) (duration: 03m 21s)
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P31998 and previous config saved to /var/cache/conftool/dbconfig/20220727-135017-marostegui.json
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817317{{!}}Tune the wikidata "language" profile for wbsearchentities (T307869)]] (1/2) (duration: 03m 29s)
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:36 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P31997 and previous config saved to /var/cache/conftool/dbconfig/20220727-133511-marostegui.json
* 13:34 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:34 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:34 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:34 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:34 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:32 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:32 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31995 and previous config saved to /var/cache/conftool/dbconfig/20220727-132005-marostegui.json
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31994 and previous config saved to /var/cache/conftool/dbconfig/20220727-131500-marostegui.json
* 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31993 and previous config saved to /var/cache/conftool/dbconfig/20220727-131439-marostegui.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P31992 and previous config saved to /var/cache/conftool/dbconfig/20220727-125933-marostegui.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P31991 and previous config saved to /var/cache/conftool/dbconfig/20220727-124426-marostegui.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31990 and previous config saved to /var/cache/conftool/dbconfig/20220727-122920-marostegui.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31989 and previous config saved to /var/cache/conftool/dbconfig/20220727-122147-marostegui.json
* 12:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 12:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31988 and previous config saved to /var/cache/conftool/dbconfig/20220727-122115-marostegui.json
* 12:17 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:17 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P31987 and previous config saved to /var/cache/conftool/dbconfig/20220727-120609-marostegui.json
* 12:00 kart_: Updated cxserver to 2022-07-27-070728-production ([[phab:T313300|T313300]], [[phab:T309577|T309577]], [[phab:T310873|T310873]], [[phab:T310880|T310880]])
* 11:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P31986 and previous config saved to /var/cache/conftool/dbconfig/20220727-115103-marostegui.json
* 11:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:47 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31985 and previous config saved to /var/cache/conftool/dbconfig/20220727-113557-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31984 and previous config saved to /var/cache/conftool/dbconfig/20220727-113136-marostegui.json
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31983 and previous config saved to /var/cache/conftool/dbconfig/20220727-112722-marostegui.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P31982 and previous config saved to /var/cache/conftool/dbconfig/20220727-111216-marostegui.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P31981 and previous config saved to /var/cache/conftool/dbconfig/20220727-105710-marostegui.json
* 10:46 Emperor: update cassandradev packages for stretch to 3.11.13 [[phab:T313742|T313742]]
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31980 and previous config saved to /var/cache/conftool/dbconfig/20220727-104204-marostegui.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31979 and previous config saved to /var/cache/conftool/dbconfig/20220727-103640-marostegui.json
* 10:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 10:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31978 and previous config saved to /var/cache/conftool/dbconfig/20220727-103619-marostegui.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P31976 and previous config saved to /var/cache/conftool/dbconfig/20220727-102113-marostegui.json
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P31974 and previous config saved to /var/cache/conftool/dbconfig/20220727-100607-marostegui.json
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31972 and previous config saved to /var/cache/conftool/dbconfig/20220727-095101-marostegui.json
* 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31971 and previous config saved to /var/cache/conftool/dbconfig/20220727-094452-marostegui.json
* 09:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 09:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31970 and previous config saved to /var/cache/conftool/dbconfig/20220727-094430-marostegui.json
* 09:35 ladsgroup@deploy1002: Synchronized portals: Fixing favicon of wikiquote and wikibooks, take III (duration: 03m 36s)
* 09:32 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: memtest86+ run
* 09:32 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: memtest86+ run
* 09:31 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Fixing favicon of wikiquote and wikibooks, take III (duration: 03m 19s)
* 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2087.codfw.wmnet
* 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P31969 and previous config saved to /var/cache/conftool/dbconfig/20220727-092924-marostegui.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2087 from dbctl [[phab:T313483|T313483]]', diff saved to https://phabricator.wikimedia.org/P31968 and previous config saved to /var/cache/conftool/dbconfig/20220727-092917-marostegui.json
* 09:25 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 09:21 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2087.codfw.wmnet
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:09 ladsgroup@deploy1002: Synchronized portals: Fixing favicon of wikiquote and wikibooks, take II (duration: 03m 24s)
* 09:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:05 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Fixing favicon of wikiquote and wikibooks, take II (duration: 03m 49s)
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P31967 and previous config saved to /var/cache/conftool/dbconfig/20220727-090221-marostegui.json
* 09:01 elukey: reboot ml-serve2001 - [[phab:T313822|T313822]]
* 08:57 elukey: restart burrow-* on kafkamon1002 to pick up zookeeper changes
* 08:57 elukey: manually create /var/run/burrow on kafkamon1002 to allow a clean restart of Burrow daemons (after zookeeper config change)
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31966 and previous config saved to /var/cache/conftool/dbconfig/20220727-084715-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31965 and previous config saved to /var/cache/conftool/dbconfig/20220727-084120-marostegui.json
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31964 and previous config saved to /var/cache/conftool/dbconfig/20220727-084042-marostegui.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2171 (s5, s6) to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31962 and previous config saved to /var/cache/conftool/dbconfig/20220727-082817-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P31961 and previous config saved to /var/cache/conftool/dbconfig/20220727-082535-marostegui.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P31960 and previous config saved to /var/cache/conftool/dbconfig/20220727-081029-marostegui.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2170 (s1, s2) to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31959 and previous config saved to /var/cache/conftool/dbconfig/20220727-080029-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31958 and previous config saved to /var/cache/conftool/dbconfig/20220727-075523-marostegui.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31957 and previous config saved to /var/cache/conftool/dbconfig/20220727-074546-marostegui.json
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 07:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2079 [[phab:T313798|T313798]]', diff saved to https://phabricator.wikimedia.org/P31956 and previous config saved to /var/cache/conftool/dbconfig/20220727-073442-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 codfw primary [[phab:T313798|T313798]]', diff saved to https://phabricator.wikimedia.org/P31955 and previous config saved to /var/cache/conftool/dbconfig/20220727-073214-marostegui.json
* 07:30 volans: restarted ferm on ms-be1065 (had failed for a timed out query)
* 07:18 volans: restarted ferm on ms-be2065 (had failed for a timed out query)
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T313798|T313798]]', diff saved to https://phabricator.wikimedia.org/P31954 and previous config saved to /var/cache/conftool/dbconfig/20220727-070901-marostegui.json
* 07:05 marostegui: Restart db2161 to change its binlog format
* 07:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: codfw s8 master switch
* 07:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: codfw s8 master switch
* 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2086.codfw.wmnet
* 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:15 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2086.codfw.wmnet
* 01:44 AndyRussG: update payments-wiki {{Gerrit|4487bd31}} -> {{Gerrit|589bb64}}
 
== 2022-07-26 ==
* 23:59 tzatziki: removing one file for legal compliance
* 22:06 brennen@deploy1002: Finished deploy [phabricator/deployment@0950b61]: test deploy to phab2001 (duration: 00m 27s)
* 22:06 brennen@deploy1002: Started deploy [phabricator/deployment@0950b61]: test deploy to phab2001
* 22:03 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
* 22:02 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 21:54 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
* 21:54 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 21:53 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
* 21:53 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 21:51 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
* 21:51 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 21:33 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 51s)
* 21:32 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 21:30 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 11s)
* 21:30 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 21:28 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 19s)
* 21:28 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 21:25 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
* 21:25 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:27 inflatador: bking@wdqs1004 restarted blazegraph services that were (are?) alerting for 503
* 20:21 ebernhardson: depool wdqs1004
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 cjming: end of UTC late backport window
* 20:19 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:816705{{!}}etwikiquote: Change logo for 10k articles (T313698)]] (duration: 03m 07s)
* 20:16 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:816705{{!}}etwikiquote: Change logo for 10k articles (T313698)]] (duration: 03m 15s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 cjming@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:816705{{!}}etwikiquote: Change logo for 10k articles (T313698)]] (duration: 03m 28s)
* 19:03 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic2049.codfw.wmnet
* 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:59 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
* 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:53 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2049.codfw.wmnet
* 18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic2049.codfw.wmnet
* 18:41 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2049.codfw.wmnet
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22  refs [[phab:T308075|T308075]]
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:04 mutante: [doc1002:~] $ sudo systemctl start rsync-doc-doc2001.codfw.wmnet.service
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:40 bking@cumin1001: conftool action : set/pooled=inactive; selector: name=elastic2049
* 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:28 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.22  refs [[phab:T308075|T308075]] (duration: 35m 50s)
* 17:12 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 17:11 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 17:10 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 17:09 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 17:09 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 17:09 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 17:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.22  refs [[phab:T308075|T308075]]
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:33 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: no-op demonstration deploy to phab2001 (duration: 00m 26s)
* 16:32 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: no-op demonstration deploy to phab2001
* 15:58 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
* 15:58 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
* 15:56 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
* 15:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
* 15:56 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 15:56 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 15:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817285{{!}}Revert "Add WikibaseTerms temporary debug log channel" (T313039)]] (grep confirms wmf.21+ code has no mentions of this channel) (duration: 03m 19s)
* 15:30 _joe_: restarting pybal on lvs1020 to check php 7.4 too
* 15:25 _joe_: restarting pybal on lvs1019 to check php 7.4 too
* 15:23 _joe_: restarting pybal on lvs2009 to check php 7.4 too
* 15:18 _joe_: restarting pybal on lvs2010 to check php 7.4 too
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2144 with weight 0 and db2143 back with 100 [[phab:T313811|T313811]]', diff saved to https://phabricator.wikimedia.org/P31952 and previous config saved to /var/cache/conftool/dbconfig/20220726-145412-root.json
* 14:52 sukhe: upload trafficserver_9.1.2-1wm1_amd64 to apt.wm.o (buster) - [[phab:T309651|T309651]]
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2143 with weight 0 [[phab:T313811|T313811]]', diff saved to https://phabricator.wikimedia.org/P31951 and previous config saved to /var/cache/conftool/dbconfig/20220726-145116-root.json
* 14:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
* 14:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
* 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31950 and previous config saved to /var/cache/conftool/dbconfig/20220726-141540-marostegui.json
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31949 and previous config saved to /var/cache/conftool/dbconfig/20220726-140034-marostegui.json
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31947 and previous config saved to /var/cache/conftool/dbconfig/20220726-134529-marostegui.json
* 13:38 taavi: UTC afternoon deploys done
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 taavi@deploy1002: Synchronized php-1.39.0-wmf.21/resources/src/jquery/jquery.textSelection.js: backporting gerrit r817231 r817232 for wmf.21, [[phab:T33780|T33780]] (duration: 03m 02s)
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31946 and previous config saved to /var/cache/conftool/dbconfig/20220726-133023-marostegui.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31945 and previous config saved to /var/cache/conftool/dbconfig/20220726-132650-marostegui.json
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31944 and previous config saved to /var/cache/conftool/dbconfig/20220726-132628-marostegui.json
* 13:25 jbond: uploaded spicerack_3.1.1 to apt.wikimedia.org bullseye-wikimedia
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31943 and previous config saved to /var/cache/conftool/dbconfig/20220726-131122-marostegui.json
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31942 and previous config saved to /var/cache/conftool/dbconfig/20220726-125617-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31941 and previous config saved to /var/cache/conftool/dbconfig/20220726-124112-marostegui.json
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31940 and previous config saved to /var/cache/conftool/dbconfig/20220726-123745-marostegui.json
* 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31939 and previous config saved to /var/cache/conftool/dbconfig/20220726-123719-marostegui.json
* 12:32 jnuche@deploy1002: Synchronized README: Verifying fix for [[phab:T313770|T313770]] (duration: 03m 14s)
* 12:24 jnuche@deploy1002: Installation of scap version "4.11.4" completed for 559 hosts
* 12:24 jnuche@deploy1002: Installing scap version "4.11.4" for 559 hosts
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31938 and previous config saved to /var/cache/conftool/dbconfig/20220726-122214-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31937 and previous config saved to /var/cache/conftool/dbconfig/20220726-120709-marostegui.json
* 12:02 oblivian@deploy1002: Synchronized README: testing fix for php restarts [[phab:T313770|T313770]] (duration: 03m 15s)
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31936 and previous config saved to /var/cache/conftool/dbconfig/20220726-115204-marostegui.json
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31935 and previous config saved to /var/cache/conftool/dbconfig/20220726-114833-marostegui.json
* 11:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31934 and previous config saved to /var/cache/conftool/dbconfig/20220726-114813-marostegui.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31933 and previous config saved to /var/cache/conftool/dbconfig/20220726-113308-marostegui.json
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31932 and previous config saved to /var/cache/conftool/dbconfig/20220726-111803-marostegui.json
* 11:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31931 and previous config saved to /var/cache/conftool/dbconfig/20220726-110258-marostegui.json
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31930 and previous config saved to /var/cache/conftool/dbconfig/20220726-110022-marostegui.json
* 11:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31929 and previous config saved to /var/cache/conftool/dbconfig/20220726-110002-marostegui.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31928 and previous config saved to /var/cache/conftool/dbconfig/20220726-104456-marostegui.json
* 10:39 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update hieradata from Netbox - volans@cumin2002"
* 10:38 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update hieradata from Netbox - volans@cumin2002"
* 10:34 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31925 and previous config saved to /var/cache/conftool/dbconfig/20220726-102951-marostegui.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31924 and previous config saved to /var/cache/conftool/dbconfig/20220726-101446-marostegui.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31923 and previous config saved to /var/cache/conftool/dbconfig/20220726-101130-marostegui.json
* 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31922 and previous config saved to /var/cache/conftool/dbconfig/20220726-101110-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31921 and previous config saved to /var/cache/conftool/dbconfig/20220726-095605-marostegui.json
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31920 and previous config saved to /var/cache/conftool/dbconfig/20220726-094100-marostegui.json
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2085.codfw.wmnet
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:40 oblivian@deploy1002: Synchronized README: testing fix for php restarts (duration: 02m 54s)
* 09:36 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 09:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2085.codfw.wmnet
* 09:31 _joe_: running puppet on the mw-canary hosts [[phab:T313770|T313770]]
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31918 and previous config saved to /var/cache/conftool/dbconfig/20220726-092555-marostegui.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31917 and previous config saved to /var/cache/conftool/dbconfig/20220726-092217-marostegui.json
* 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:21 jnuche@deploy1002: Installation of scap version "4.11.3" completed for 1 hosts
* 09:21 jnuche@deploy1002: Installing scap version "4.11.3" for 1 hosts
* 09:13 volans: manually restarting php on MW canaries: cumin 'A:mw-canary' 'restart-php-fpm-all'
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31916 and previous config saved to /var/cache/conftool/dbconfig/20220726-090241-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31915 and previous config saved to /var/cache/conftool/dbconfig/20220726-090237-root.json
* 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31914 and previous config saved to /var/cache/conftool/dbconfig/20220726-084737-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31913 and previous config saved to /var/cache/conftool/dbconfig/20220726-084733-root.json
* 08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1020
* 08:40 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1020
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31912 and previous config saved to /var/cache/conftool/dbconfig/20220726-083233-root.json
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31911 and previous config saved to /var/cache/conftool/dbconfig/20220726-083229-root.json
* 08:33 marostegui: Promote pc1014 to pc3 master [[phab:T313401|T313401]]
* 08:33 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc3 master (duration: 03m 13s)
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:33 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:26 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:26 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:19 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31909 and previous config saved to /var/cache/conftool/dbconfig/20220726-081729-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31908 and previous config saved to /var/cache/conftool/dbconfig/20220726-081725-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31907 and previous config saved to /var/cache/conftool/dbconfig/20220726-080225-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31906 and previous config saved to /var/cache/conftool/dbconfig/20220726-080221-root.json
* 07:48 _joe_: deploy python3-poolcounter everywhere [[phab:T310835|T310835]]
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31903 and previous config saved to /var/cache/conftool/dbconfig/20220726-074721-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31902 and previous config saved to /var/cache/conftool/dbconfig/20220726-074717-root.json
* 07:41 vgutierrez: rolling restart of ats-be on cp[1080,1083,1085,1087,5006,6001,6006,6009,6011,6015]
* 07:30 _joe_: running a restart-all for php-fpm on appservers in codfw to test python-poolcounter 0.0.3 [[phab:T310835|T310835]]
* 06:58 _joe_: upgrade all of codfw to python3-poolcounter 0.0.3 [[phab:T310835|T310835]]
* 06:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
* 06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 06:36 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 06:24 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
* 06:21 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
* 06:07 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:11 TimStarling: restarted php7.2-fpm on the 9 canary hosts in eqiad [[phab:T313770|T313770]]
 
== 2022-07-25 ==
* 22:54 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31900 and previous config saved to /var/cache/conftool/dbconfig/20220725-224153-ladsgroup.json
* 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31899 and previous config saved to /var/cache/conftool/dbconfig/20220725-222648-ladsgroup.json
* 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31898 and previous config saved to /var/cache/conftool/dbconfig/20220725-221143-ladsgroup.json
* 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31897 and previous config saved to /var/cache/conftool/dbconfig/20220725-215637-ladsgroup.json
* 21:27 brennen@deploy1002: Finished scap: no-op deploy to get wmf.21 on all boxen ([[phab:T313770|T313770]]) (duration: 03m 33s)
* 21:24 brennen@deploy1002: Started scap: no-op deploy to get wmf.21 on all boxen ([[phab:T313770|T313770]])
* 21:20 brennen: running a no-op sync-world for [[phab:T313770|T313770]] to hopefully get 1.39.0-wmf.21 ([[phab:T308074|T308074]]) to all servers.
* 20:28 cjming: end of UTC late backport window
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:816706{{!}}[cirrus] Increase shard count for ruwikinews]] (duration: 03m 15s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cjming@deploy1002: Synchronized wmf-config: Config: [[gerrit:810405{{!}}Remove Table of Contents config (T310527)]] (duration: 03m 13s)
* 19:24 mutante: after new wikis have been created apparently they need a "initSiteStats.php" run to make statistics work but this only runs in a timer on mwmaint once weekly or so
* 19:23 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.service
* 17:07 jbond: enable puppet fleet wide
* 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31895 and previous config saved to /var/cache/conftool/dbconfig/20220725-165931-ladsgroup.json
* 16:49 jbond: disable puppet fleet wide
* 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31894 and previous config saved to /var/cache/conftool/dbconfig/20220725-164426-ladsgroup.json
* 16:31 ejegg: updated payments-wiki from {{Gerrit|f56e9391}} to {{Gerrit|4487bd31}}
* 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31893 and previous config saved to /var/cache/conftool/dbconfig/20220725-162921-ladsgroup.json
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31892 and previous config saved to /var/cache/conftool/dbconfig/20220725-161416-ladsgroup.json
* 16:14 bblack: cp*: re-enable puppet for normal staggered rollout (cp4027 tested all the esitest stuff without incident)
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31891 and previous config saved to /var/cache/conftool/dbconfig/20220725-160532-ladsgroup.json
* 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31890 and previous config saved to /var/cache/conftool/dbconfig/20220725-160512-ladsgroup.json
* 15:59 bblack: cp*: temporarily disable puppet to test esitest service rollout
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31888 and previous config saved to /var/cache/conftool/dbconfig/20220725-155007-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31887 and previous config saved to /var/cache/conftool/dbconfig/20220725-153502-ladsgroup.json
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31886 and previous config saved to /var/cache/conftool/dbconfig/20220725-151957-ladsgroup.json
* 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31885 and previous config saved to /var/cache/conftool/dbconfig/20220725-150212-ladsgroup.json
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 15:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31884 and previous config saved to /var/cache/conftool/dbconfig/20220725-150039-ladsgroup.json
* 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31883 and previous config saved to /var/cache/conftool/dbconfig/20220725-144827-ladsgroup.json
* 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31882 and previous config saved to /var/cache/conftool/dbconfig/20220725-144534-ladsgroup.json
* 14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary [[phab:T309896|T309896]] - mvernon@cumin2002
* 14:38 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary [[phab:T309896|T309896]] - mvernon@cumin2002
* 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31881 and previous config saved to /var/cache/conftool/dbconfig/20220725-143321-ladsgroup.json
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31880 and previous config saved to /var/cache/conftool/dbconfig/20220725-143029-ladsgroup.json
* 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31879 and previous config saved to /var/cache/conftool/dbconfig/20220725-141816-ladsgroup.json
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31878 and previous config saved to /var/cache/conftool/dbconfig/20220725-141523-ladsgroup.json
* 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31877 and previous config saved to /var/cache/conftool/dbconfig/20220725-141236-ladsgroup.json
* 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31876 and previous config saved to /var/cache/conftool/dbconfig/20220725-141215-ladsgroup.json
* 14:12 andrewbogott: updating wikitech-static to MediaWiki 1.38.2
* 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31875 and previous config saved to /var/cache/conftool/dbconfig/20220725-140311-ladsgroup.json
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 Lucas_WMDE: UTC afternoon backport+config window done
* 14:01 Lucas_WMDE: lucaswerkmeister-wmde@mw1320:~$ sudo -i /usr/local/sbin/restart-php7.2-fpm  # [[phab:T310847|T310847]] just in case
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 Lucas_WMDE: lucaswerkmeister-wmde@mw1320:~$ scap pull # [[phab:T310847|T310847]] (repeat failed host from earlier sync)
* 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811765{{!}}Add sampling to android.breadcrumbs event stream. (T310847)]] (duration: 02m 56s)
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31874 and previous config saved to /var/cache/conftool/dbconfig/20220725-135710-ladsgroup.json
* 13:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:816316{{!}}ptwikinews: Install WikiLove extension (T313173)]] (duration: 03m 19s)
* 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31873 and previous config saved to /var/cache/conftool/dbconfig/20220725-134205-ladsgroup.json
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:31 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php ptwikinews wikilove # [[phab:T313173|T313173]]
* 13:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:816242{{!}}ruwikivoyage: Add "suppressredirect" right to "filemover" group (T313614)]] (duration: 03m 17s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31872 and previous config saved to /var/cache/conftool/dbconfig/20220725-132700-ladsgroup.json
* 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:21 Emperor: set min_part_hours to 12 for eqiad swift on ms-fe1009 [[phab:T312643|T312643]]
* 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31871 and previous config saved to /var/cache/conftool/dbconfig/20220725-132012-ladsgroup.json
* 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31870 and previous config saved to /var/cache/conftool/dbconfig/20220725-131952-ladsgroup.json
* 13:16 Emperor: set min_part_hours to 12 for codfw swift on ms-fe2009 [[phab:T312643|T312643]]
* 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31864 and previous config saved to /var/cache/conftool/dbconfig/20220725-130447-ladsgroup.json
* 13:02 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:02 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31863 and previous config saved to /var/cache/conftool/dbconfig/20220725-124942-ladsgroup.json
* 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31862 and previous config saved to /var/cache/conftool/dbconfig/20220725-123436-ladsgroup.json
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31861 and previous config saved to /var/cache/conftool/dbconfig/20220725-122953-ladsgroup.json
* 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 12:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31860 and previous config saved to /var/cache/conftool/dbconfig/20220725-122839-ladsgroup.json
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31859 and previous config saved to /var/cache/conftool/dbconfig/20220725-121334-ladsgroup.json
* 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31858 and previous config saved to /var/cache/conftool/dbconfig/20220725-115829-ladsgroup.json
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31857 and previous config saved to /var/cache/conftool/dbconfig/20220725-114324-ladsgroup.json
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31856 and previous config saved to /var/cache/conftool/dbconfig/20220725-113939-ladsgroup.json
* 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31855 and previous config saved to /var/cache/conftool/dbconfig/20220725-113919-ladsgroup.json
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31854 and previous config saved to /var/cache/conftool/dbconfig/20220725-112528-ladsgroup.json
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31853 and previous config saved to /var/cache/conftool/dbconfig/20220725-112413-ladsgroup.json
* 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31852 and previous config saved to /var/cache/conftool/dbconfig/20220725-111023-ladsgroup.json
* 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31851 and previous config saved to /var/cache/conftool/dbconfig/20220725-110908-ladsgroup.json
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31850 and previous config saved to /var/cache/conftool/dbconfig/20220725-105518-ladsgroup.json
* 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31848 and previous config saved to /var/cache/conftool/dbconfig/20220725-105403-ladsgroup.json
* 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31846 and previous config saved to /var/cache/conftool/dbconfig/20220725-105114-ladsgroup.json
* 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31845 and previous config saved to /var/cache/conftool/dbconfig/20220725-105054-ladsgroup.json
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31841 and previous config saved to /var/cache/conftool/dbconfig/20220725-104013-ladsgroup.json
* 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31837 and previous config saved to /var/cache/conftool/dbconfig/20220725-103549-ladsgroup.json
* 10:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:26 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:816721{{!}}Fixing favicon of wikiquote and wikibooks]] (duration: 02m 55s)
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:23 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:816721{{!}}Fixing favicon of wikiquote and wikibooks]] (duration: 03m 03s)
* 10:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:21 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31834 and previous config saved to /var/cache/conftool/dbconfig/20220725-102043-ladsgroup.json
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31833 and previous config saved to /var/cache/conftool/dbconfig/20220725-100538-ladsgroup.json
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1109 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31832 and previous config saved to /var/cache/conftool/dbconfig/20220725-100254-ladsgroup.json
* 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31831 and previous config saved to /var/cache/conftool/dbconfig/20220725-100234-ladsgroup.json
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31826 and previous config saved to /var/cache/conftool/dbconfig/20220725-094729-ladsgroup.json
* 09:34 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31825 and previous config saved to /var/cache/conftool/dbconfig/20220725-093222-ladsgroup.json
* 09:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:26 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31824 and previous config saved to /var/cache/conftool/dbconfig/20220725-091740-ladsgroup.json
* 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31823 and previous config saved to /var/cache/conftool/dbconfig/20220725-091717-ladsgroup.json
* 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31822 and previous config saved to /var/cache/conftool/dbconfig/20220725-091435-ladsgroup.json
* 09:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
* 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
* 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 09:10 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 09:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31821 and previous config saved to /var/cache/conftool/dbconfig/20220725-090906-ladsgroup.json
* 09:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31820 and previous config saved to /var/cache/conftool/dbconfig/20220725-090604-ladsgroup.json
* 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P31819 and previous config saved to /var/cache/conftool/dbconfig/20220725-090113-ladsgroup.json
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P31818 and previous config saved to /var/cache/conftool/dbconfig/20220725-084609-ladsgroup.json
* 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P31817 and previous config saved to /var/cache/conftool/dbconfig/20220725-083105-ladsgroup.json
* 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: Maint done', diff saved to https://phabricator.wikimedia.org/P31816 and previous config saved to /var/cache/conftool/dbconfig/20220725-081601-ladsgroup.json
* 08:15 kartik@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Translate: Backport: [[gerrit:816272{{!}}ReviewTranslationActionApi: Move to namespace and add strict types (T312008 T313608)]] (duration: 03m 09s)
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:37 kartik@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:802443{{!}}Explicitly set math rendering modes (T309686)]] (duration: 03m 11s)
* 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:31 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:23 volans@cumin2002: START - Cookbook sre.dns.netbox
* 07:16 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815829{{!}}Enable Section Translation in Uzbek Wikipedia (T310116)]] (duration: 03m 04s)
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:30 XioNoX: power off asw2-d5-eqiad for decommissioning - [[phab:T313115|T313115]]
 
== 2022-07-24 ==
* 20:54 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
* 20:37 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
* 14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31815 and previous config saved to /var/cache/conftool/dbconfig/20220724-100221-ladsgroup.json
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31814 and previous config saved to /var/cache/conftool/dbconfig/20220724-094716-ladsgroup.json
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31813 and previous config saved to /var/cache/conftool/dbconfig/20220724-093211-ladsgroup.json
* 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31812 and previous config saved to /var/cache/conftool/dbconfig/20220724-091706-ladsgroup.json
* 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31811 and previous config saved to /var/cache/conftool/dbconfig/20220724-041542-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31810 and previous config saved to /var/cache/conftool/dbconfig/20220724-040037-ladsgroup.json
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31809 and previous config saved to /var/cache/conftool/dbconfig/20220724-034532-ladsgroup.json
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31808 and previous config saved to /var/cache/conftool/dbconfig/20220724-034356-ladsgroup.json
* 03:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 03:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31807 and previous config saved to /var/cache/conftool/dbconfig/20220724-034336-ladsgroup.json
* 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31806 and previous config saved to /var/cache/conftool/dbconfig/20220724-033027-ladsgroup.json
* 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31805 and previous config saved to /var/cache/conftool/dbconfig/20220724-032831-ladsgroup.json
* 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31804 and previous config saved to /var/cache/conftool/dbconfig/20220724-031326-ladsgroup.json
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31803 and previous config saved to /var/cache/conftool/dbconfig/20220724-025820-ladsgroup.json
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31802 and previous config saved to /var/cache/conftool/dbconfig/20220724-003718-ladsgroup.json
* 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31801 and previous config saved to /var/cache/conftool/dbconfig/20220724-003652-ladsgroup.json
* 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31800 and previous config saved to /var/cache/conftool/dbconfig/20220724-002147-ladsgroup.json
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31799 and previous config saved to /var/cache/conftool/dbconfig/20220724-000641-ladsgroup.json
 
== 2022-07-23 ==
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31798 and previous config saved to /var/cache/conftool/dbconfig/20220723-235136-ladsgroup.json
* 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31797 and previous config saved to /var/cache/conftool/dbconfig/20220723-232948-ladsgroup.json
* 23:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31796 and previous config saved to /var/cache/conftool/dbconfig/20220723-232927-ladsgroup.json
* 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31795 and previous config saved to /var/cache/conftool/dbconfig/20220723-231422-ladsgroup.json
* 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31794 and previous config saved to /var/cache/conftool/dbconfig/20220723-225917-ladsgroup.json
* 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31793 and previous config saved to /var/cache/conftool/dbconfig/20220723-224412-ladsgroup.json
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31792 and previous config saved to /var/cache/conftool/dbconfig/20220723-220740-ladsgroup.json
* 22:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31791 and previous config saved to /var/cache/conftool/dbconfig/20220723-220720-ladsgroup.json
* 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31790 and previous config saved to /var/cache/conftool/dbconfig/20220723-215215-ladsgroup.json
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31789 and previous config saved to /var/cache/conftool/dbconfig/20220723-213710-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31788 and previous config saved to /var/cache/conftool/dbconfig/20220723-213610-ladsgroup.json
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31787 and previous config saved to /var/cache/conftool/dbconfig/20220723-212204-ladsgroup.json
* 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31786 and previous config saved to /var/cache/conftool/dbconfig/20220723-212105-ladsgroup.json
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31785 and previous config saved to /var/cache/conftool/dbconfig/20220723-210559-ladsgroup.json
* 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31784 and previous config saved to /var/cache/conftool/dbconfig/20220723-205054-ladsgroup.json
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31783 and previous config saved to /var/cache/conftool/dbconfig/20220723-204049-ladsgroup.json
* 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31782 and previous config saved to /var/cache/conftool/dbconfig/20220723-164105-ladsgroup.json
* 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31781 and previous config saved to /var/cache/conftool/dbconfig/20220723-164045-ladsgroup.json
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31780 and previous config saved to /var/cache/conftool/dbconfig/20220723-162540-ladsgroup.json
* 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31779 and previous config saved to /var/cache/conftool/dbconfig/20220723-161035-ladsgroup.json
* 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31778 and previous config saved to /var/cache/conftool/dbconfig/20220723-155530-ladsgroup.json
* 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 15:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31777 and previous config saved to /var/cache/conftool/dbconfig/20220723-155311-ladsgroup.json
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31776 and previous config saved to /var/cache/conftool/dbconfig/20220723-153805-ladsgroup.json
* 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31775 and previous config saved to /var/cache/conftool/dbconfig/20220723-152300-ladsgroup.json
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31774 and previous config saved to /var/cache/conftool/dbconfig/20220723-151951-ladsgroup.json
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31773 and previous config saved to /var/cache/conftool/dbconfig/20220723-151930-ladsgroup.json
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31772 and previous config saved to /var/cache/conftool/dbconfig/20220723-150754-ladsgroup.json
* 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31771 and previous config saved to /var/cache/conftool/dbconfig/20220723-150425-ladsgroup.json
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31770 and previous config saved to /var/cache/conftool/dbconfig/20220723-144920-ladsgroup.json
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31769 and previous config saved to /var/cache/conftool/dbconfig/20220723-143414-ladsgroup.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31768 and previous config saved to /var/cache/conftool/dbconfig/20220723-105825-ladsgroup.json
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31767 and previous config saved to /var/cache/conftool/dbconfig/20220723-105805-ladsgroup.json
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31766 and previous config saved to /var/cache/conftool/dbconfig/20220723-105257-ladsgroup.json
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31765 and previous config saved to /var/cache/conftool/dbconfig/20220723-105238-ladsgroup.json
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31764 and previous config saved to /var/cache/conftool/dbconfig/20220723-105228-ladsgroup.json
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31763 and previous config saved to /var/cache/conftool/dbconfig/20220723-104300-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31762 and previous config saved to /var/cache/conftool/dbconfig/20220723-103733-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31761 and previous config saved to /var/cache/conftool/dbconfig/20220723-103723-ladsgroup.json
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31760 and previous config saved to /var/cache/conftool/dbconfig/20220723-102755-ladsgroup.json
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31759 and previous config saved to /var/cache/conftool/dbconfig/20220723-102227-ladsgroup.json
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31758 and previous config saved to /var/cache/conftool/dbconfig/20220723-102218-ladsgroup.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31757 and previous config saved to /var/cache/conftool/dbconfig/20220723-101250-ladsgroup.json
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31756 and previous config saved to /var/cache/conftool/dbconfig/20220723-100722-ladsgroup.json
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31755 and previous config saved to /var/cache/conftool/dbconfig/20220723-100713-ladsgroup.json
* 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31754 and previous config saved to /var/cache/conftool/dbconfig/20220723-095241-ladsgroup.json
* 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31753 and previous config saved to /var/cache/conftool/dbconfig/20220723-053604-ladsgroup.json
* 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31752 and previous config saved to /var/cache/conftool/dbconfig/20220723-052925-ladsgroup.json
* 05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31751 and previous config saved to /var/cache/conftool/dbconfig/20220723-015300-ladsgroup.json
* 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31750 and previous config saved to /var/cache/conftool/dbconfig/20220723-013755-ladsgroup.json
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31749 and previous config saved to /var/cache/conftool/dbconfig/20220723-012250-ladsgroup.json
* 01:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31748 and previous config saved to /var/cache/conftool/dbconfig/20220723-010745-ladsgroup.json
* 00:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
* 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
* 00:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31747 and previous config saved to /var/cache/conftool/dbconfig/20220723-001125-ladsgroup.json
 
== 2022-07-22 ==
* 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31746 and previous config saved to /var/cache/conftool/dbconfig/20220722-235619-ladsgroup.json
* 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31745 and previous config saved to /var/cache/conftool/dbconfig/20220722-234114-ladsgroup.json
* 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31744 and previous config saved to /var/cache/conftool/dbconfig/20220722-232609-ladsgroup.json
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31743 and previous config saved to /var/cache/conftool/dbconfig/20220722-215349-ladsgroup.json
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31742 and previous config saved to /var/cache/conftool/dbconfig/20220722-215329-ladsgroup.json
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31741 and previous config saved to /var/cache/conftool/dbconfig/20220722-213824-ladsgroup.json
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31740 and previous config saved to /var/cache/conftool/dbconfig/20220722-212319-ladsgroup.json
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31739 and previous config saved to /var/cache/conftool/dbconfig/20220722-211308-ladsgroup.json
* 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31738 and previous config saved to /var/cache/conftool/dbconfig/20220722-211259-ladsgroup.json
* 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31737 and previous config saved to /var/cache/conftool/dbconfig/20220722-210813-ladsgroup.json
* 21:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 29s)
* 21:04 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31736 and previous config saved to /var/cache/conftool/dbconfig/20220722-205754-ladsgroup.json
* 20:44 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 07s)
* 20:44 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31735 and previous config saved to /var/cache/conftool/dbconfig/20220722-204248-ladsgroup.json
* 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31734 and previous config saved to /var/cache/conftool/dbconfig/20220722-203708-ladsgroup.json
* 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31733 and previous config saved to /var/cache/conftool/dbconfig/20220722-202743-ladsgroup.json
* 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31732 and previous config saved to /var/cache/conftool/dbconfig/20220722-202203-ladsgroup.json
* 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31731 and previous config saved to /var/cache/conftool/dbconfig/20220722-200658-ladsgroup.json
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31730 and previous config saved to /var/cache/conftool/dbconfig/20220722-195153-ladsgroup.json
* 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31729 and previous config saved to /var/cache/conftool/dbconfig/20220722-194428-ladsgroup.json
* 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31727 and previous config saved to /var/cache/conftool/dbconfig/20220722-173218-ladsgroup.json
* 17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:54 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts (duration: 08m 47s)
* 16:45 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts
* 16:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 16:02 jbond: puppet-agent to puppet7 component
* 15:57 jbond: ruby-semantic-puppet to puppet7 component
* 15:49 jbond: ruby-sorted-set to puppet7 component
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2046.codfw.wmnet with OS bullseye
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31725 and previous config saved to /var/cache/conftool/dbconfig/20220722-150727-ladsgroup.json
* 15:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31724 and previous config saved to /var/cache/conftool/dbconfig/20220722-150707-ladsgroup.json
* 15:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
* 15:03 jbond: ruby-rbtree to puppet7 component
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 15:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31722 and previous config saved to /var/cache/conftool/dbconfig/20220722-145201-ladsgroup.json
* 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31721 and previous config saved to /var/cache/conftool/dbconfig/20220722-144734-ladsgroup.json
* 14:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2046.codfw.wmnet with OS bullseye
* 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31720 and previous config saved to /var/cache/conftool/dbconfig/20220722-143655-ladsgroup.json
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31719 and previous config saved to /var/cache/conftool/dbconfig/20220722-143229-ladsgroup.json
* 14:29 moritzm: restarting tomcat on idp-test.w.o
* 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31718 and previous config saved to /var/cache/conftool/dbconfig/20220722-142150-ladsgroup.json
* 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31717 and previous config saved to /var/cache/conftool/dbconfig/20220722-141724-ladsgroup.json
* 13:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2033.codfw.wmnet with OS bullseye
* 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
* 13:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
* 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31713 and previous config saved to /var/cache/conftool/dbconfig/20220722-131710-ladsgroup.json
* 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31712 and previous config saved to /var/cache/conftool/dbconfig/20220722-131650-ladsgroup.json
* 13:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31711 and previous config saved to /var/cache/conftool/dbconfig/20220722-130145-ladsgroup.json
* 12:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31710 and previous config saved to /var/cache/conftool/dbconfig/20220722-124640-ladsgroup.json
* 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
* 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31708 and previous config saved to /var/cache/conftool/dbconfig/20220722-102452-ladsgroup.json
* 10:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 10:21 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31707 and previous config saved to /var/cache/conftool/dbconfig/20220722-100948-ladsgroup.json
* 10:06 XioNoX: push pfw policies - [[phab:T313522|T313522]]
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31706 and previous config saved to /var/cache/conftool/dbconfig/20220722-095444-ladsgroup.json
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31705 and previous config saved to /var/cache/conftool/dbconfig/20220722-093940-ladsgroup.json
* 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31704 and previous config saved to /var/cache/conftool/dbconfig/20220722-093754-ladsgroup.json
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31702 and previous config saved to /var/cache/conftool/dbconfig/20220722-093453-ladsgroup.json
* 09:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31701 and previous config saved to /var/cache/conftool/dbconfig/20220722-084647-ladsgroup.json
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31700 and previous config saved to /var/cache/conftool/dbconfig/20220722-084627-ladsgroup.json
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 12 hosts with reason: Maintenance
* 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 12 hosts with reason: Maintenance
* 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31697 and previous config saved to /var/cache/conftool/dbconfig/20220722-080112-ladsgroup.json
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31696 and previous config saved to /var/cache/conftool/dbconfig/20220722-074844-ladsgroup.json
* 07:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 07:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 06:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS bullseye
* 06:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
* 05:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS bullseye
* 05:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
* 05:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
* 05:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2021.codfw.wmnet with OS bullseye
* 05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 04:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31694 and previous config saved to /var/cache/conftool/dbconfig/20220722-045543-ladsgroup.json
* 04:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31693 and previous config saved to /var/cache/conftool/dbconfig/20220722-044038-ladsgroup.json
* 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31692 and previous config saved to /var/cache/conftool/dbconfig/20220722-042533-ladsgroup.json
* 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31691 and previous config saved to /var/cache/conftool/dbconfig/20220722-041028-ladsgroup.json
* 04:05 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: disable debug log on test2wiki (cleanup) (duration: 03m 05s)
* 04:01 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I9051d20cd1}} (duration: 03m 02s)
* 03:58 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I9051d20cd1}} (duration: 03m 10s)
* 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31690 and previous config saved to /var/cache/conftool/dbconfig/20220722-031014-ladsgroup.json
* 03:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31689 and previous config saved to /var/cache/conftool/dbconfig/20220722-030954-ladsgroup.json
* 03:09 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable debug log on test2wiki (duration: 02m 47s)
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31688 and previous config saved to /var/cache/conftool/dbconfig/20220722-025449-ladsgroup.json
* 02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31687 and previous config saved to /var/cache/conftool/dbconfig/20220722-023943-ladsgroup.json
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31685 and previous config saved to /var/cache/conftool/dbconfig/20220722-002622-ladsgroup.json
* 00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31684 and previous config saved to /var/cache/conftool/dbconfig/20220722-002601-ladsgroup.json
* 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31683 and previous config saved to /var/cache/conftool/dbconfig/20220722-001056-ladsgroup.json
 
== 2022-07-21 ==
* 23:53 mutante: https://policy.wikimedia.org moved from Wordpress DNS back to WMF DNS - now redirects to https://wikimediafoundation.org/advocacy/ as requested on [[phab:T310738|T310738]] {{!}} this might also resolve [[phab:T132104|T132104]] or not because wikimediafoundation.org is also on wordpress VIP
* 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31680 and previous config saved to /var/cache/conftool/dbconfig/20220721-234045-ladsgroup.json
* 23:22 mutante: [cumin2002:~] $ sudo cumin 'C:profile::httpbb' "rm /srv/deployment/httpbb-tests/appserver/test_search.yaml"
* 23:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2045.codfw.wmnet with OS bullseye
* 22:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
* 22:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
* 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31678 and previous config saved to /var/cache/conftool/dbconfig/20220721-223048-ladsgroup.json
* 22:30 mutante: re-enabling puppet on all remaining 'C:profile::mediawiki::httpd'
* 22:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
* 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31677 and previous config saved to /var/cache/conftool/dbconfig/20220721-221543-ladsgroup.json
* 22:09 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2045.codfw.wmnet with OS bullseye
* 22:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
* 22:02 dancy@deploy1002: Installation of scap version "4.11.3" completed for 559 hosts
* 22:02 dancy@deploy1002: Installing scap version "4.11.3" for 559 hosts
* 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31676 and previous config saved to /var/cache/conftool/dbconfig/20220721-220038-ladsgroup.json
* 21:56 mutante: re-enabling puppet on mw2 in groups (codfw)
* 21:48 mutante: re-enabling puppet on parsoid (wtp*)
* 21:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31675 and previous config saved to /var/cache/conftool/dbconfig/20220721-214532-ladsgroup.json
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31674 and previous config saved to /var/cache/conftool/dbconfig/20220721-213246-ladsgroup.json
* 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31673 and previous config saved to /var/cache/conftool/dbconfig/20220721-213237-ladsgroup.json
* 21:17 mutante: puppet re-enabled on mw-api-canary and parsoid-canary
* 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31672 and previous config saved to /var/cache/conftool/dbconfig/20220721-211732-ladsgroup.json
* 20:52 mutante: deploying apache config change on cluster, slowly..puppet disabled on C:profile::mediawiki::httpd .. then re-enabling starting with mwdebug.. using httpbb to test it.. then re-enabling puppet on more hosts  https://gerrit.wikimedia.org/r/c/operations/puppet/+/809324  Bug: [[phab:T310738|T310738]]
* 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31669 and previous config saved to /var/cache/conftool/dbconfig/20220721-204518-ladsgroup.json
* 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 20:39 mutante: disabling puppet on mw appservers to deploy gerrit:809324 - [[phab:T310738|T310738]]
* 20:34 cjming: end of UTC late backport window
* 20:34 bd808: Proof of life for stashbot processing !logs
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 andrewbogott: testing the log by logging a test
* 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31668 and previous config saved to /var/cache/conftool/dbconfig/20220721-202348-ladsgroup.json
* 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31667 and previous config saved to /var/cache/conftool/dbconfig/20220721-202311-ladsgroup.json
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31666 and previous config saved to /var/cache/conftool/dbconfig/20220721-200806-ladsgroup.json
* 19:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 19:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31665 and previous config saved to /var/cache/conftool/dbconfig/20220721-195301-ladsgroup.json
* 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31664 and previous config saved to /var/cache/conftool/dbconfig/20220721-193756-ladsgroup.json
* 19:35 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 05s)
* 19:35 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 19:34 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 05s)
* 19:34 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 19:31 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS bullseye
* 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31662 and previous config saved to /var/cache/conftool/dbconfig/20220721-191136-ladsgroup.json
* 19:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2066.codfw.wmnet with reason: host reimage
* 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31661 and previous config saved to /var/cache/conftool/dbconfig/20220721-185631-ladsgroup.json
* 18:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 18:42 tzatziki: running extensions/SecurePoll/cli/wm-scripts/bv2022/populateEditCount.php on all 8 sections
* 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31660 and previous config saved to /var/cache/conftool/dbconfig/20220721-184126-ladsgroup.json
* 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31659 and previous config saved to /var/cache/conftool/dbconfig/20220721-183723-ladsgroup.json
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 18:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31658 and previous config saved to /var/cache/conftool/dbconfig/20220721-183703-ladsgroup.json
* 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:34 dancy@deploy1002: Finished scap: Backport for [[gerrit:816022]] MWConfigCacheGenerator.php: Use grace period of 3 minutes (duration: 03m 39s)
* 18:31 dancy@deploy1002: Started scap: Backport for [[gerrit:816022]] MWConfigCacheGenerator.php: Use grace period of 3 minutes
* 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31656 and previous config saved to /var/cache/conftool/dbconfig/20220721-182033-ladsgroup.json
* 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31655 and previous config saved to /var/cache/conftool/dbconfig/20220721-182013-ladsgroup.json
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:14 brennen: testing scap deployment to phab2001, this is a no-op for production services
* 18:12 brennen@deploy1002: Finished deploy [phabricator/deployment@358bb3a]: (no justification provided) (duration: 01m 17s)
* 18:11 brennen@deploy1002: Started deploy [phabricator/deployment@358bb3a]: (no justification provided)
* 18:10 tzatziki: creating tables for board election with bv2022_tables.sql
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:07 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31654 and previous config saved to /var/cache/conftool/dbconfig/20220721-180653-ladsgroup.json
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31653 and previous config saved to /var/cache/conftool/dbconfig/20220721-180508-ladsgroup.json
* 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31652 and previous config saved to /var/cache/conftool/dbconfig/20220721-175147-ladsgroup.json
* 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31651 and previous config saved to /var/cache/conftool/dbconfig/20220721-175003-ladsgroup.json
* 17:42 dwisehaupt: reclone of frdb2003 from frdb1003 is complete. all services back in service.
* 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic207[0-2].*
* 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=no; selector: name=elastic2066.codfw.wmnet
* 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic206[1-9].*
* 17:36 dancy@deploy1002: Synchronized README: Gathering timing info (duration: 03m 09s)
* 17:35 ryankemper@cumin1001: conftool action : GET; selector: name=elastic6*
* 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31650 and previous config saved to /var/cache/conftool/dbconfig/20220721-173458-ladsgroup.json
* 17:30 ryankemper@cumin1001: conftool action : set/weight=10,pooled=yes; selector: name=elastic6*
* 17:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:20 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:20 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:19 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:17 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:17 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:00 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 16:58 ryankemper: [[phab:T300943|T300943]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/816017 to get conftool-data entries for new elastic2* hosts
* 16:58 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: merging upstream config changes [[phab:T309896|T309896]] - mvernon@cumin1001
* 16:44 ryankemper@cumin1001: conftool action : set/weight=10; selector: name=elastic6*
* 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31649 and previous config saved to /var/cache/conftool/dbconfig/20220721-163859-ladsgroup.json
* 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31648 and previous config saved to /var/cache/conftool/dbconfig/20220721-162458-ladsgroup.json
* 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31647 and previous config saved to /var/cache/conftool/dbconfig/20220721-162419-ladsgroup.json
* 16:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS buster
* 16:09 ryankemper: [[phab:T300943|T300943]] Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/816008 and running puppet twice on elastic20[64-72]
* 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31646 and previous config saved to /var/cache/conftool/dbconfig/20220721-160914-ladsgroup.json
* 16:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS buster
* 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31645 and previous config saved to /var/cache/conftool/dbconfig/20220721-160522-ladsgroup.json
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31644 and previous config saved to /var/cache/conftool/dbconfig/20220721-155409-ladsgroup.json
* 15:50 ryankemper: [[phab:T300943|T300943]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/815823 and running puppet across elastic2* in preparation for adding new codfw hosts into service
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31643 and previous config saved to /var/cache/conftool/dbconfig/20220721-155017-ladsgroup.json
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31642 and previous config saved to /var/cache/conftool/dbconfig/20220721-153904-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31641 and previous config saved to /var/cache/conftool/dbconfig/20220721-153512-ladsgroup.json
* 15:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
* 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
* 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
* 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
* 15:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: [[gerrit:815970{{!}}Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869)]] (2/2) (duration: 03m 13s)
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815970{{!}}Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869)]] (1/2) (duration: 02m 59s)
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31640 and previous config saved to /var/cache/conftool/dbconfig/20220721-152007-ladsgroup.json
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:16 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS buster
* 15:16 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS buster
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Wikibase/repo/: Backport: [[gerrit:815983{{!}}Fix profile in wbsearchentities and wbsearch (T307869)]] (duration: 03m 07s)
* 15:13 moritzm: draining ganeti2021 [[phab:T310483|T310483]]
* 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2014.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 15:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2014.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 14:45 moritzm: upgrading ganeti/eqsin to 3.0.2 [[phab:T312637|T312637]]
* 14:39 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: merging upstream config changes [[phab:T309896|T309896]] - mvernon@cumin1001
* 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31639 and previous config saved to /var/cache/conftool/dbconfig/20220721-143544-ladsgroup.json
* 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31638 and previous config saved to /var/cache/conftool/dbconfig/20220721-143524-ladsgroup.json
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31637 and previous config saved to /var/cache/conftool/dbconfig/20220721-142523-marostegui.json
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1192.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1187.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1195.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1188.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1191.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1185.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1194.eqiad.wmnet with OS bullseye
* 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1186.eqiad.wmnet with OS bullseye
* 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1190.eqiad.wmnet with OS bullseye
* 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1193.eqiad.wmnet with OS bullseye
* 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1189.eqiad.wmnet with OS bullseye
* 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31636 and previous config saved to /var/cache/conftool/dbconfig/20220721-142019-ladsgroup.json
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31635 and previous config saved to /var/cache/conftool/dbconfig/20220721-141938-root.json
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1187.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS bullseye
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
* 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bullseye
* 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS bullseye
* 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31634 and previous config saved to /var/cache/conftool/dbconfig/20220721-141018-marostegui.json
* 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31633 and previous config saved to /var/cache/conftool/dbconfig/20220721-140513-ladsgroup.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31632 and previous config saved to /var/cache/conftool/dbconfig/20220721-140434-root.json
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31631 and previous config saved to /var/cache/conftool/dbconfig/20220721-140004-ladsgroup.json
* 13:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 13:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1185.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1192.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1187.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1191.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1190.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1188.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1186.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1193.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1194.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1189.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1195.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31630 and previous config saved to /var/cache/conftool/dbconfig/20220721-135513-marostegui.json
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31629 and previous config saved to /var/cache/conftool/dbconfig/20220721-135008-ladsgroup.json
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31628 and previous config saved to /var/cache/conftool/dbconfig/20220721-134250-root.json
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1192.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1193.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1190.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1185.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1187.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1188.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1186.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1191.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1189.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31627 and previous config saved to /var/cache/conftool/dbconfig/20220721-134008-marostegui.json
* 13:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31626 and previous config saved to /var/cache/conftool/dbconfig/20220721-132824-marostegui.json
* 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31625 and previous config saved to /var/cache/conftool/dbconfig/20220721-132746-root.json
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31624 and previous config saved to /var/cache/conftool/dbconfig/20220721-132639-marostegui.json
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 moritzm: installing paramiko security updates
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 Lucas_WMDE: pulled config change {{Gerrit|Iee6de25983}} to mwdebug1001, then reverted in {{Gerrit|I9248270621}} and pulled that too; neither was synced to other hosts
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 moritzm: installing xen security updates
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31623 and previous config saved to /var/cache/conftool/dbconfig/20220721-131040-ladsgroup.json
* 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31622 and previous config saved to /var/cache/conftool/dbconfig/20220721-125108-marostegui.json
* 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31621 and previous config saved to /var/cache/conftool/dbconfig/20220721-123603-marostegui.json
* 12:21 dwisehaupt: started reclone of frdb2003 from frdb1003
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31620 and previous config saved to /var/cache/conftool/dbconfig/20220721-122058-marostegui.json
* 12:07 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31619 and previous config saved to /var/cache/conftool/dbconfig/20220721-120553-marostegui.json
* 12:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 18:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 12:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 18:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31618 and previous config saved to /var/cache/conftool/dbconfig/20220721-115607-marostegui.json
* 11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 11:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31617 and previous config saved to /var/cache/conftool/dbconfig/20220721-114641-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31616 and previous config saved to /var/cache/conftool/dbconfig/20220721-113136-marostegui.json
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2078.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31615 and previous config saved to /var/cache/conftool/dbconfig/20220721-111631-marostegui.json
* 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 11:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 11:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 11:09 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 11:08 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 11:08 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 11:07 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:07 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 11:07 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:03 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2078.codfw.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31614 and previous config saved to /var/cache/conftool/dbconfig/20220721-110126-marostegui.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31613 and previous config saved to /var/cache/conftool/dbconfig/20220721-105856-ladsgroup.json
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubetcd2006.codfw.wmnet with reason: Switch to DRBD, [[phab:T311686|T311686]]
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on kubetcd2006.codfw.wmnet with reason: Switch to DRBD, [[phab:T311686|T311686]]
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31612 and previous config saved to /var/cache/conftool/dbconfig/20220721-104351-ladsgroup.json
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31611 and previous config saved to /var/cache/conftool/dbconfig/20220721-104039-marostegui.json
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31610 and previous config saved to /var/cache/conftool/dbconfig/20220721-104002-marostegui.json
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31609 and previous config saved to /var/cache/conftool/dbconfig/20220721-102846-ladsgroup.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31608 and previous config saved to /var/cache/conftool/dbconfig/20220721-102457-marostegui.json
* 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:18 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
* 10:17 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
* 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:15 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
* 10:15 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
* 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:14 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to cluster codfw and group D
* 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31607 and previous config saved to /var/cache/conftool/dbconfig/20220721-101341-ladsgroup.json
* 10:11 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
* 10:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31606 and previous config saved to /var/cache/conftool/dbconfig/20220721-100951-marostegui.json
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2009.codfw.wmnet to cluster codfw and group C
* 10:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2009.codfw.wmnet to cluster codfw and group C
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 09:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to plain disk storage, [[phab:T311686|T311686]]
* 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31605 and previous config saved to /var/cache/conftool/dbconfig/20220721-095454-ladsgroup.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31604 and previous config saved to /var/cache/conftool/dbconfig/20220721-095446-marostegui.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2085 and db2086 from dbctl �[3~', diff saved to https://phabricator.wikimedia.org/P31603 and previous config saved to /var/cache/conftool/dbconfig/20220721-095439-marostegui.json
* 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31602 and previous config saved to /var/cache/conftool/dbconfig/20220721-093755-ladsgroup.json
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 09:32 jbond: enable puppet on A:cp post gerrit:815728
* 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31601 and previous config saved to /var/cache/conftool/dbconfig/20220721-093032-ladsgroup.json
* 09:21 moritzm: installing containerd security updates in Kubernetes eqiad masters
* 09:18 jbond: disable puppet on A:cp for gerrit:815728
* 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31599 and previous config saved to /var/cache/conftool/dbconfig/20220721-091527-ladsgroup.json
* 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31598 and previous config saved to /var/cache/conftool/dbconfig/20220721-090022-ladsgroup.json
* 08:59 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:59 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:57 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:55 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:54 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:54 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:53 klausman@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31597 and previous config saved to /var/cache/conftool/dbconfig/20220721-084935-marostegui.json
* 08:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 08:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2169 to s6 and s7 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31595 and previous config saved to /var/cache/conftool/dbconfig/20220721-083147-marostegui.json
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:18 moritzm: installing containerd security updates in Kubernetes eqiad workers
* 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31594 and previous config saved to /var/cache/conftool/dbconfig/20220721-081449-marostegui.json
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31593 and previous config saved to /var/cache/conftool/dbconfig/20220721-075944-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P31592 and previous config saved to /var/cache/conftool/dbconfig/20220721-075757-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31591 and previous config saved to /var/cache/conftool/dbconfig/20220721-075745-root.json
* 07:46 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:815895{{!}}Adding Wikiquote to the new portals (T273179)]] (duration: 03m 10s)
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31590 and previous config saved to /var/cache/conftool/dbconfig/20220721-074439-marostegui.json
* 07:43 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:815895{{!}}Adding Wikiquote to the new portals (T273179)]] (duration: 03m 08s)
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P31589 and previous config saved to /var/cache/conftool/dbconfig/20220721-074253-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31588 and previous config saved to /var/cache/conftool/dbconfig/20220721-074242-root.json
* 07:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31587 and previous config saved to /var/cache/conftool/dbconfig/20220721-073502-ladsgroup.json
* 07:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31586 and previous config saved to /var/cache/conftool/dbconfig/20220721-073251-ladsgroup.json
* 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31585 and previous config saved to /var/cache/conftool/dbconfig/20220721-073217-ladsgroup.json
* 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS bullseye
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31584 and previous config saved to /var/cache/conftool/dbconfig/20220721-072934-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P31583 and previous config saved to /var/cache/conftool/dbconfig/20220721-072749-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31582 and previous config saved to /var/cache/conftool/dbconfig/20220721-072738-root.json
* 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31581 and previous config saved to /var/cache/conftool/dbconfig/20220721-071953-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31580 and previous config saved to /var/cache/conftool/dbconfig/20220721-071932-marostegui.json
* 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2026.codfw.wmnet with reason: host reimage
* 07:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2026.codfw.wmnet with reason: host reimage
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P31579 and previous config saved to /var/cache/conftool/dbconfig/20220721-071245-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31578 and previous config saved to /var/cache/conftool/dbconfig/20220721-071234-root.json
* 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2020.codfw.wmnet to cluster codfw and group B
* 07:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2020.codfw.wmnet to cluster codfw and group B
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31577 and previous config saved to /var/cache/conftool/dbconfig/20220721-070427-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P31576 and previous config saved to /var/cache/conftool/dbconfig/20220721-065741-root.json
* 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS bullseye
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31575 and previous config saved to /var/cache/conftool/dbconfig/20220721-065730-root.json
* 06:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2009.codfw.wmnet with OS bullseye
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31574 and previous config saved to /var/cache/conftool/dbconfig/20220721-064922-marostegui.json
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 06:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P31573 and previous config saved to /var/cache/conftool/dbconfig/20220721-064237-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31572 and previous config saved to /var/cache/conftool/dbconfig/20220721-064226-root.json
* 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2009.codfw.wmnet with reason: host reimage
* 06:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 06:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 06:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2009.codfw.wmnet with reason: host reimage
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31571 and previous config saved to /var/cache/conftool/dbconfig/20220721-063417-marostegui.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P31570 and previous config saved to /var/cache/conftool/dbconfig/20220721-062733-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31569 and previous config saved to /var/cache/conftool/dbconfig/20220721-062722-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31568 and previous config saved to /var/cache/conftool/dbconfig/20220721-062431-marostegui.json
* 06:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 06:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS bullseye
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P31567 and previous config saved to /var/cache/conftool/dbconfig/20220721-061228-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31566 and previous config saved to /var/cache/conftool/dbconfig/20220721-061217-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 [[phab:T313398|T313398]]', diff saved to https://phabricator.wikimedia.org/P31565 and previous config saved to /var/cache/conftool/dbconfig/20220721-061145-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write [[phab:T313398|T313398]]', diff saved to https://phabricator.wikimedia.org/P31564 and previous config saved to /var/cache/conftool/dbconfig/20220721-061001-root.json
* 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 06:08 marostegui: Starting x1 eqiad failover from db1120 to db1103 - [[phab:T313398|T313398]]
* 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P31563 and previous config saved to /var/cache/conftool/dbconfig/20220721-060427-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 primary and set section read-write [[phab:T313383|T313383]]', diff saved to https://phabricator.wikimedia.org/P31562 and previous config saved to /var/cache/conftool/dbconfig/20220721-060112-root.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - [[phab:T313383|T313383]]', diff saved to https://phabricator.wikimedia.org/P31561 and previous config saved to /var/cache/conftool/dbconfig/20220721-060037-marostegui.json
* 06:00 marostegui: Starting s7 eqiad failover from db1181 to db1136 - [[phab:T313383|T313383]]
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 [[phab:T313398|T313398]]', diff saved to https://phabricator.wikimedia.org/P31560 and previous config saved to /var/cache/conftool/dbconfig/20220721-051752-root.json
* 05:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T313398|T313398]]
* 05:14 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T313398|T313398]]
* 05:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 [[phab:T313383|T313383]]
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1136 with weight 0 [[phab:T313383|T313383]]', diff saved to https://phabricator.wikimedia.org/P31559 and previous config saved to /var/cache/conftool/dbconfig/20220721-051358-root.json
* 05:13 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 [[phab:T313383|T313383]]
* 00:44 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
 
== 2022-07-20 ==
* 23:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS bullseye
* 23:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS bullseye
* 23:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS bullseye
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS bullseye
* 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS bullseye
* 23:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
* 23:29 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
* 23:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
* 23:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
* 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
* 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
* 23:11 ryankemper: [[phab:T300943|T300943]] Fixed IPMI passwords for elastic `20[67,68,70,71,72]`, reimaging them to bullseye (these hosts are not in service, thus the batch operation)
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS bullseye
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS bullseye
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS bullseye
* 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS bullseye
* 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS bullseye
* 21:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 21:45 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 20:45 cjming: end of UTC late backport window
* 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814906{{!}}Deploy the new grid layout to group 1 (T312241)]] (duration: 03m 16s)
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814906{{!}}Deploy the new grid layout to group 1 (T312241)]] (duration: 03m 14s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2032.codfw.wmnet with OS bullseye
* 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815359{{!}}Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670)]] (duration: 03m 26s)
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31555 and previous config saved to /var/cache/conftool/dbconfig/20220720-201240-marostegui.json
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815359{{!}}Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670)]] (duration: 03m 10s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
* 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31554 and previous config saved to /var/cache/conftool/dbconfig/20220720-195734-marostegui.json
* 19:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2032.codfw.wmnet with OS bullseye
* 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 02m 53s)
* 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31553 and previous config saved to /var/cache/conftool/dbconfig/20220720-194229-marostegui.json
* 19:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:33 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/3D/src/PatentFormField.php: Backport: [[gerrit:815733{{!}}PatentFormField: pass on $this->mParent to HTMLRadioField constructor (T313432)]] (duration: 03m 08s)
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31552 and previous config saved to /var/cache/conftool/dbconfig/20220720-192724-marostegui.json
* 19:17 jeena: that should be revert group1 wikis to 1.39.0-wmf.19
* 19:13 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group[0{{!}}1] wikis to [VERSION]"
* 18:37 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:35 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2045.codfw.wmnet with OS bullseye
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31551 and previous config saved to /var/cache/conftool/dbconfig/20220720-182710-marostegui.json
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 18:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 18:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 15 hosts with reason: Maintenance
* 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 15 hosts with reason: Maintenance
* 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31550 and previous config saved to /var/cache/conftool/dbconfig/20220720-182339-marostegui.json
* 18:17 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 03m 07s)
* 18:15 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31549 and previous config saved to /var/cache/conftool/dbconfig/20220720-180834-marostegui.json
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31548 and previous config saved to /var/cache/conftool/dbconfig/20220720-175328-marostegui.json
* 17:51 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 17:50 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2048.codfw.wmnet with OS bullseye
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31547 and previous config saved to /var/cache/conftool/dbconfig/20220720-173823-marostegui.json
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31546 and previous config saved to /var/cache/conftool/dbconfig/20220720-173522-marostegui.json
* 17:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 17:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31545 and previous config saved to /var/cache/conftool/dbconfig/20220720-173502-marostegui.json
* 17:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
* 17:25 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31544 and previous config saved to /var/cache/conftool/dbconfig/20220720-171956-marostegui.json
* 17:12 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'enable-puppet 815759'
* 17:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2048.codfw.wmnet with OS bullseye
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31543 and previous config saved to /var/cache/conftool/dbconfig/20220720-170451-marostegui.json
* 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31542 and previous config saved to /var/cache/conftool/dbconfig/20220720-164946-marostegui.json
* 16:49 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'disable-puppet 815759'
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31541 and previous config saved to /var/cache/conftool/dbconfig/20220720-164638-marostegui.json
* 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 16:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31540 and previous config saved to /var/cache/conftool/dbconfig/20220720-164618-marostegui.json
* 16:40 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31539 and previous config saved to /var/cache/conftool/dbconfig/20220720-163113-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31538 and previous config saved to /var/cache/conftool/dbconfig/20220720-161608-marostegui.json
* 16:05 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31537 and previous config saved to /var/cache/conftool/dbconfig/20220720-160103-marostegui.json
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31536 and previous config saved to /var/cache/conftool/dbconfig/20220720-155752-marostegui.json
* 15:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31535 and previous config saved to /var/cache/conftool/dbconfig/20220720-155732-marostegui.json
* 15:57 dancy@deploy1002: Installation of scap version "4.11.2" completed for 557 hosts
* 15:56 dancy@deploy1002: Installing scap version "4.11.2" for 557 hosts
* 15:50 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
* 15:46 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2036.codfw.wmnet with OS bullseye
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31534 and previous config saved to /var/cache/conftool/dbconfig/20220720-154227-marostegui.json
* 15:39 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing
* 15:35 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
* 15:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31532 and previous config saved to /var/cache/conftool/dbconfig/20220720-152721-marostegui.json
* 15:26 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 15:26 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 15:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
* 15:20 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db2167:3318', diff saved to https://phabricator.wikimedia.org/P31531 and previous config saved to /var/cache/conftool/dbconfig/20220720-151711-marostegui.json
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31530 and previous config saved to /var/cache/conftool/dbconfig/20220720-151216-marostegui.json
* 15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31529 and previous config saved to /var/cache/conftool/dbconfig/20220720-150908-marostegui.json
* 15:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31528 and previous config saved to /var/cache/conftool/dbconfig/20220720-150730-marostegui.json
* 15:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2036.codfw.wmnet with OS bullseye
* 14:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31527 and previous config saved to /var/cache/conftool/dbconfig/20220720-145224-marostegui.json
* 14:44 volans: installing spicearck 3.1.0 on cumin2002
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31524 and previous config saved to /var/cache/conftool/dbconfig/20220720-143719-marostegui.json
* 14:36 volans: uploaded spicerack_3.1.0 to apt.wikimedia.org bullseye-wikimedia
* 14:26 moritzm: installing containerd security updates in Kubernetes codfw masters
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31523 and previous config saved to /var/cache/conftool/dbconfig/20220720-142214-marostegui.json
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31522 and previous config saved to /var/cache/conftool/dbconfig/20220720-141912-marostegui.json
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31521 and previous config saved to /var/cache/conftool/dbconfig/20220720-141851-marostegui.json
* 14:04 Lucas_WMDE: UTC afternoon backport+config window done
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31520 and previous config saved to /var/cache/conftool/dbconfig/20220720-140346-marostegui.json
* 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: [[gerrit:815726{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (2/2) (duration: 03m 02s)
* 14:02 jbond: disable puppet on A:cp to deplot Gerrit:768766
* 13:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: [[gerrit:815726{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (1/2) (duration: 02m 56s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:54 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ git -C php-1.39.0-wmf.19/extensions/WikibaseLexeme am --skip # [[phab:T308659|T308659]] backport already applied
* 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2034.codfw.wmnet with OS bullseye
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31519 and previous config saved to /var/cache/conftool/dbconfig/20220720-134841-marostegui.json
* 13:45 moritzm: installing containerd security updates in Kubernetes codfw cluster
* 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: [[gerrit:815425{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (2/2) (duration: 03m 08s)
* 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: [[gerrit:815425{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (1/2) (duration: 03m 34s)
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 moritzm: installing request-tracker4 security updates
* 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 - [[phab:T313337|T313337]]
* 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 -
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31518 and previous config saved to /var/cache/conftool/dbconfig/20220720-133336-marostegui.json
* 13:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31517 and previous config saved to /var/cache/conftool/dbconfig/20220720-133030-marostegui.json
* 13:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31516 and previous config saved to /var/cache/conftool/dbconfig/20220720-133010-marostegui.json
* 13:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
* 13:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2034.codfw.wmnet with OS bullseye
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31515 and previous config saved to /var/cache/conftool/dbconfig/20220720-131505-marostegui.json
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31514 and previous config saved to /var/cache/conftool/dbconfig/20220720-130000-marostegui.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31513 and previous config saved to /var/cache/conftool/dbconfig/20220720-124453-marostegui.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31512 and previous config saved to /var/cache/conftool/dbconfig/20220720-124042-marostegui.json
* 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31511 and previous config saved to /var/cache/conftool/dbconfig/20220720-123751-marostegui.json
* 12:29 marostegui: Move pc1014 from pc2 to pc3 [[phab:T313401|T313401]]
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31510 and previous config saved to /var/cache/conftool/dbconfig/20220720-122246-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31509 and previous config saved to /var/cache/conftool/dbconfig/20220720-120738-marostegui.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31507 and previous config saved to /var/cache/conftool/dbconfig/20220720-115233-marostegui.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31506 and previous config saved to /var/cache/conftool/dbconfig/20220720-113424-marostegui.json
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 11:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
* 11:05 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
* 11:03 moritzm: draining ganeti2014 [[phab:T310483|T310483]]
* 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2020.codfw.wmnet with OS bullseye
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 12 hosts with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 12 hosts with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31504 and previous config saved to /var/cache/conftool/dbconfig/20220720-103825-marostegui.json
* 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
* 10:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31503 and previous config saved to /var/cache/conftool/dbconfig/20220720-102320-marostegui.json
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2020.codfw.wmnet with OS bullseye
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31502 and previous config saved to /var/cache/conftool/dbconfig/20220720-100815-marostegui.json
* 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2029.codfw.wmnet to cluster codfw and group A
* 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to cluster codfw and group A
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31501 and previous config saved to /var/cache/conftool/dbconfig/20220720-095310-marostegui.json
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31499 and previous config saved to /var/cache/conftool/dbconfig/20220720-085256-marostegui.json
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31498 and previous config saved to /var/cache/conftool/dbconfig/20220720-085236-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31497 and previous config saved to /var/cache/conftool/dbconfig/20220720-083731-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31496 and previous config saved to /var/cache/conftool/dbconfig/20220720-082226-marostegui.json
* 08:14 elukey: apt-get clean on archiva1002 to free some space
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31495 and previous config saved to /var/cache/conftool/dbconfig/20220720-080721-marostegui.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31494 and previous config saved to /var/cache/conftool/dbconfig/20220720-080509-marostegui.json
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31493 and previous config saved to /var/cache/conftool/dbconfig/20220720-080442-marostegui.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31492 and previous config saved to /var/cache/conftool/dbconfig/20220720-074937-marostegui.json
* 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
* 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
* 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31491 and previous config saved to /var/cache/conftool/dbconfig/20220720-073432-marostegui.json
* 07:31 jayme: ml-serve1002.eqiad.wmnet,ml-serve1004.eqiad.wmnet 'systemctl restart rsyslog'
* 07:30 taavi@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/SecurePoll/cli/wm-scripts/bv2022/populateEditCount.php: [[phab:T309753|T309753]] backports (duration: 02m 54s)
* 07:30 jayme: kubernetes1010.eqiad.wmnet,kubernetes1020.eqiad.wmnet 'systemctl restart rsyslog'
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/bv2022/: [[phab:T309753|T309753]] backports (duration: 02m 57s)
* 07:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31490 and previous config saved to /var/cache/conftool/dbconfig/20220720-071927-marostegui.json
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS bullseye
* 07:14 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815251{{!}}Enable ContentTranslation out of Beta for sswiki (T309384)]] (duration: 03m 24s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31489 and previous config saved to /var/cache/conftool/dbconfig/20220720-071114-marostegui.json
* 07:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31488 and previous config saved to /var/cache/conftool/dbconfig/20220720-071054-marostegui.json
* 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
* 06:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31487 and previous config saved to /var/cache/conftool/dbconfig/20220720-065549-marostegui.json
* 06:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS bullseye
* 06:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31486 and previous config saved to /var/cache/conftool/dbconfig/20220720-064044-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31485 and previous config saved to /var/cache/conftool/dbconfig/20220720-062539-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31484 and previous config saved to /var/cache/conftool/dbconfig/20220720-062327-marostegui.json
* 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31483 and previous config saved to /var/cache/conftool/dbconfig/20220720-062307-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31482 and previous config saved to /var/cache/conftool/dbconfig/20220720-060802-marostegui.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31481 and previous config saved to /var/cache/conftool/dbconfig/20220720-055256-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31480 and previous config saved to /var/cache/conftool/dbconfig/20220720-053751-marostegui.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31479 and previous config saved to /var/cache/conftool/dbconfig/20220720-053620-marostegui.json
* 05:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31478 and previous config saved to /var/cache/conftool/dbconfig/20220720-053520-marostegui.json
* 05:26 marostegui: Stop mysql on db2087 (s6 and s7) to clone db2169 [[phab:T311493|T311493]]
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31475 and previous config saved to /var/cache/conftool/dbconfig/20220720-052014-marostegui.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31474 and previous config saved to /var/cache/conftool/dbconfig/20220720-050509-marostegui.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2168 to dbctl in s7 and s8 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31473 and previous config saved to /var/cache/conftool/dbconfig/20220720-045918-marostegui.json
* 04:57 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31472 and previous config saved to /var/cache/conftool/dbconfig/20220720-045004-marostegui.json
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31471 and previous config saved to /var/cache/conftool/dbconfig/20220720-044729-marostegui.json
* 04:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 04:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 04:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 04:10 rzl: rzl@kubemaster1001:~$ sudo systemctl restart kube-apiserver
* 04:08 rzl: rzl@kubemaster1002:~$ sudo systemctl restart kube-apiserver
* 03:48 rzl: rzl@cumin2002:~$ sudo cumin dbproxy[1019,1020,1021].eqiad.wmnet 'systemctl reload haproxy'
* 03:37 rzl: rzl@dbproxy1018:~$ sudo systemctl reload haproxy
* 03:30 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 03:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host elastic2060.codfw.wmnet with OS bullseye
* 03:19 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
* 03:10 tstarling@deploy1002: Finished scap: revert yue -> zh fallback, needs LC rebuild in both branches [[phab:T296188|T296188]] (duration: 19m 41s)
* 02:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:51 tstarling@deploy1002: Started scap: revert yue -> zh fallback, needs LC rebuild in both branches [[phab:T296188|T296188]]
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2052.codfw.wmnet with OS bullseye
* 01:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
* 01:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
* 01:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2052.codfw.wmnet with OS bullseye
* 01:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS bullseye
* 00:43 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
* 00:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
* 00:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye
 
== 2022-07-19 ==
* 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 8 hosts with reason: Maintenance
* 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 8 hosts with reason: Maintenance
* 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31470 and previous config saved to /var/cache/conftool/dbconfig/20220719-225828-marostegui.json
* 22:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2050.codfw.wmnet with OS bullseye
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31469 and previous config saved to /var/cache/conftool/dbconfig/20220719-224323-marostegui.json
* 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
* 22:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
* 22:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31468 and previous config saved to /var/cache/conftool/dbconfig/20220719-222818-marostegui.json
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31467 and previous config saved to /var/cache/conftool/dbconfig/20220719-221312-marostegui.json
* 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31466 and previous config saved to /var/cache/conftool/dbconfig/20220719-221035-marostegui.json
* 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2050.codfw.wmnet with OS bullseye
* 22:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31465 and previous config saved to /var/cache/conftool/dbconfig/20220719-220946-marostegui.json
* 22:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31464 and previous config saved to /var/cache/conftool/dbconfig/20220719-215441-marostegui.json
* 21:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 21:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31463 and previous config saved to /var/cache/conftool/dbconfig/20220719-213936-marostegui.json
* 21:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2026.codfw.wmnet with OS bullseye
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:36 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 04m 02s)
* 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 21:26 dancy@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Config: [[gerrit:815317{{!}}MWConfigCacheGenerator: If opcache.revalidate_freq is 0, use grace period of 10 seconds (T311788)]] (duration: 02m 59s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31462 and previous config saved to /var/cache/conftool/dbconfig/20220719-212431-marostegui.json
* 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31461 and previous config saved to /var/cache/conftool/dbconfig/20220719-212149-marostegui.json
* 21:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31460 and previous config saved to /var/cache/conftool/dbconfig/20220719-212128-marostegui.json
* 21:17 jforrester@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Scribunto/includes/Hooks.php: Train unblocker: [[gerrit:815281{{!}}Hooks: Bump scribunto-stats cache version (T313341)]] (duration: 03m 14s)
* 21:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
* 21:14 cjming: end of UTC late backport window
* 21:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815374{{!}}uzwiki: Create "eliminator" group (T302670)]] (duration: 03m 13s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815374{{!}}uzwiki: Create "eliminator" group (T302670)]] (duration: 03m 19s)
* 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31459 and previous config saved to /var/cache/conftool/dbconfig/20220719-210623-marostegui.json
* 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2026.codfw.wmnet with OS bullseye
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:00 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:776334{{!}}Add "uploader" user group for kswiki. (T305320)]] (duration: 02m 58s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:774841{{!}}Add file mover user group for azwiki (T304968)]] (duration: 02m 52s)
* 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31458 and previous config saved to /var/cache/conftool/dbconfig/20220719-205118-marostegui.json
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:774841{{!}}Add file mover user group for azwiki (T304968)]] (duration: 03m 15s)
* 20:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2055.codfw.wmnet with OS bullseye
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:815360{{!}} Bumping portals to master (T128546)]] (duration: 02m 53s)
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31457 and previous config saved to /var/cache/conftool/dbconfig/20220719-203613-marostegui.json
* 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31456 and previous config saved to /var/cache/conftool/dbconfig/20220719-203327-marostegui.json
* 20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:815360{{!}} Bumping portals to master (T128546)]] (duration: 03m 09s)
* 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31455 and previous config saved to /var/cache/conftool/dbconfig/20220719-203307-marostegui.json
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:29 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815338{{!}}[wmf-config]: Undeploy GDI Survey Wave 2 (T312866)]] (duration: 03m 12s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 cjming@deploy1002: Synchronized wmf-config: Config: [[gerrit:814869{{!}}Deploy the new grid layout to group 0 wikis (T312241)]] (duration: 03m 05s)
* 20:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31454 and previous config saved to /var/cache/conftool/dbconfig/20220719-201802-marostegui.json
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814908{{!}}cirrus: Dont recycle completion suggester indices]] (duration: 03m 12s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "testwikis to 1.39.0-wmf.19"
* 20:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2055.codfw.wmnet with OS bullseye
* 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31453 and previous config saved to /var/cache/conftool/dbconfig/20220719-200257-marostegui.json
* 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:51 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.19"
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31452 and previous config saved to /var/cache/conftool/dbconfig/20220719-194752-marostegui.json
* 19:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2056.codfw.wmnet with OS bullseye
* 19:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 19:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS bullseye
* 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31451 and previous config saved to /var/cache/conftool/dbconfig/20220719-192207-marostegui.json
* 19:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 19:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 19:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31450 and previous config saved to /var/cache/conftool/dbconfig/20220719-192147-marostegui.json
* 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2069.codfw.wmnet with reason: host reimage
* 19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31449 and previous config saved to /var/cache/conftool/dbconfig/20220719-190642-marostegui.json
* 19:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2056.codfw.wmnet with reason: host reimage
* 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2069.codfw.wmnet with reason: host reimage
* 19:02 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 19:02 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2056.codfw.wmnet with reason: host reimage
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31448 and previous config saved to /var/cache/conftool/dbconfig/20220719-185137-marostegui.json
* 18:50 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS bullseye
* 18:49 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.17, 1.39.0-wmf.18 (duration: 02m 09s)
* 18:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:44 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2056.codfw.wmnet with OS bullseye
* 18:42 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31447 and previous config saved to /var/cache/conftool/dbconfig/20220719-183632-marostegui.json
* 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31446 and previous config saved to /var/cache/conftool/dbconfig/20220719-183351-marostegui.json
* 18:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 18:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31445 and previous config saved to /var/cache/conftool/dbconfig/20220719-183330-marostegui.json
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:18 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
* 18:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31444 and previous config saved to /var/cache/conftool/dbconfig/20220719-181825-marostegui.json
* 18:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 18:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31443 and previous config saved to /var/cache/conftool/dbconfig/20220719-180320-marostegui.json
* 17:51 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 04m 24s)
* 17:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31442 and previous config saved to /var/cache/conftool/dbconfig/20220719-174815-marostegui.json
* 17:46 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31441 and previous config saved to /var/cache/conftool/dbconfig/20220719-174537-marostegui.json
* 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:45 jhuneidi@deploy1002: Installation of scap version "4.11.1" completed for 557 hosts
* 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31440 and previous config saved to /var/cache/conftool/dbconfig/20220719-174517-marostegui.json
* 17:45 jhuneidi@deploy1002: Installing scap version "4.11.1" for 557 hosts
* 17:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31439 and previous config saved to /var/cache/conftool/dbconfig/20220719-173012-marostegui.json
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31438 and previous config saved to /var/cache/conftool/dbconfig/20220719-171507-marostegui.json
* 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 jhuneidi@deploy1002: scap failed: ValueError php_fpm expected targets, 0 given (duration: 37m 54s)
* 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31437 and previous config saved to /var/cache/conftool/dbconfig/20220719-170002-marostegui.json
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31436 and previous config saved to /var/cache/conftool/dbconfig/20220719-165747-marostegui.json
* 16:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 16:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 16:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 16:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 16:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1028.eqiad.wmnet
* 16:43 XioNoX: cr2-eqiad# run request chassis fpc slot 3 offline
* 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1028.eqiad.wmnet
* 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:28 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 16:23 jhuneidi@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.39.0-wmf.19/cache/gitinfo/info-extensions-FileImporter.json' (duration: 00m 00s)
* 16:23 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 16:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 16:18 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 8 hosts with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 8 hosts with reason: Maintenance
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 16:18 XioNoX: drain traffic away from cr2-eqiad:fpc3 - [[phab:T312745|T312745]]
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31435 and previous config saved to /var/cache/conftool/dbconfig/20220719-161803-marostegui.json
* 16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:14 jhuneidi@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.39.0-wmf.19/cache/gitinfo/info-extensions-GrowthExperiments.json' (duration: 00m 00s)
* 16:14 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 16:04 moritzm: installing node-minimist security updates
* 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31434 and previous config saved to /var/cache/conftool/dbconfig/20220719-160258-marostegui.json
* 15:58 moritzm: draining ganeti2020 [[phab:T310483|T310483]]
* 15:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2029.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 15:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2029.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 15:56 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
* 15:55 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
* 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31433 and previous config saved to /var/cache/conftool/dbconfig/20220719-154753-marostegui.json
* 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31432 and previous config saved to /var/cache/conftool/dbconfig/20220719-153248-marostegui.json
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31431 and previous config saved to /var/cache/conftool/dbconfig/20220719-153040-marostegui.json
* 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31430 and previous config saved to /var/cache/conftool/dbconfig/20220719-153009-marostegui.json
* 15:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1007.wikimedia.org
* 15:22 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
* 15:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31429 and previous config saved to /var/cache/conftool/dbconfig/20220719-151503-marostegui.json
* 15:14 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 15:13 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1001
* 15:12 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1001
* 15:03 moritzm: installing nghttp2 security updates
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31427 and previous config saved to /var/cache/conftool/dbconfig/20220719-145958-marostegui.json
* 14:50 moritzm: installing python-urlllib3 security updates
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31426 and previous config saved to /var/cache/conftool/dbconfig/20220719-144453-marostegui.json
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31425 and previous config saved to /var/cache/conftool/dbconfig/20220719-144245-marostegui.json
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31424 and previous config saved to /var/cache/conftool/dbconfig/20220719-144208-marostegui.json
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31423 and previous config saved to /var/cache/conftool/dbconfig/20220719-142703-marostegui.json
* 14:23 dancy@deploy1002: Installation of scap version "4.11.0" completed for 557 hosts
* 14:22 dancy@deploy1002: Installing scap version "4.11.0" for 557 hosts
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest1001.eqiad.wmnet
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:16 moritzm: installing glib2.0 security updates
* 14:15 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31422 and previous config saved to /var/cache/conftool/dbconfig/20220719-141158-marostegui.json
* 14:11 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts sretest1001.eqiad.wmnet
* 13:58 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts sretest1001.eqiad.wmnet
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31421 and previous config saved to /var/cache/conftool/dbconfig/20220719-135652-marostegui.json
* 13:55 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts sretest1001.eqiad.wmnet
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31420 and previous config saved to /var/cache/conftool/dbconfig/20220719-135532-marostegui.json
* 13:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 13:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31419 and previous config saved to /var/cache/conftool/dbconfig/20220719-135511-marostegui.json
* 13:45 moritzm: installing cron security updates
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31418 and previous config saved to /var/cache/conftool/dbconfig/20220719-134006-marostegui.json
* 13:37 marostegui: Stop mysql on db1132 to upgrade package
* 13:34 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 13:33 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Resync after touching (duration: 02m 38s)
* 13:32 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 13:30 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 13:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 13:28 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 13:27 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 13:26 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 13:25 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31417 and previous config saved to /var/cache/conftool/dbconfig/20220719-132501-marostegui.json
* 13:24 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 13:23 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 13:22 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 13:22 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 13:21 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814373{{!}}brwikimedia: Use logo and wordmark in vector-2022 and minerva (T313194)]] (duration: 02m 48s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:16 hashar@deploy1002: Synchronized static/images/mobile/copyright: Config: [[gerrit:814372{{!}}brwikimedia: Add logo and wordmark for vector-2022 and minerva (T313194)]] (duration: 02m 57s)
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31416 and previous config saved to /var/cache/conftool/dbconfig/20220719-130956-marostegui.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31415 and previous config saved to /var/cache/conftool/dbconfig/20220719-130736-marostegui.json
* 13:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31414 and previous config saved to /var/cache/conftool/dbconfig/20220719-130716-marostegui.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31413 and previous config saved to /var/cache/conftool/dbconfig/20220719-125211-marostegui.json
* 12:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netboxdb1001.eqiad.wmnet
* 12:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:45 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb1001.eqiad.wmnet
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31412 and previous config saved to /var/cache/conftool/dbconfig/20220719-123706-marostegui.json
* 12:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netboxdb1001.eqiad.wmnet
* 12:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:26 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:25 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb1001.eqiad.wmnet
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31411 and previous config saved to /var/cache/conftool/dbconfig/20220719-122201-marostegui.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31409 and previous config saved to /var/cache/conftool/dbconfig/20220719-121941-marostegui.json
* 12:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31408 and previous config saved to /var/cache/conftool/dbconfig/20220719-121921-marostegui.json
* 12:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31407 and previous config saved to /var/cache/conftool/dbconfig/20220719-120416-marostegui.json
* 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:01 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache ([[phab:T310777|T310777]]) (duration: 02m 49s)
* 12:00 moritzm: upgrading ganeti/ulsfo to 3.0.2 [[phab:T312637|T312637]]
* 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31406 and previous config saved to /var/cache/conftool/dbconfig/20220719-115719-ladsgroup.json
* 11:52 urbanecm@deploy1002: Synchronized langlist: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 42s)
* 11:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 35s)
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31405 and previous config saved to /var/cache/conftool/dbconfig/20220719-114911-marostegui.json
* 11:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 49s)
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 11:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 11:43 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 56s)
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31404 and previous config saved to /var/cache/conftool/dbconfig/20220719-114214-ladsgroup.json
* 11:41 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating blkwiki ([[phab:T310777|T310777]])
* 11:37 urbanecm@deploy1002: Synchronized dblists: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 52s)
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 47s)
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31403 and previous config saved to /var/cache/conftool/dbconfig/20220719-113406-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31401 and previous config saved to /var/cache/conftool/dbconfig/20220719-113158-marostegui.json
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31400 and previous config saved to /var/cache/conftool/dbconfig/20220719-113137-marostegui.json
* 11:27 moritzm: remove ganeti 3.0.1-2+deb11u0 from buster-wikimedia, superceded by ganeti 3.0.2-1~deb11u1 from Bullseye 11.4 point release [[phab:T312637|T312637]]
* 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31399 and previous config saved to /var/cache/conftool/dbconfig/20220719-112708-ladsgroup.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31398 and previous config saved to /var/cache/conftool/dbconfig/20220719-111632-marostegui.json
* 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31397 and previous config saved to /var/cache/conftool/dbconfig/20220719-111203-ladsgroup.json
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to plain, [[phab:T311686|T311686]]
* 11:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to plain, [[phab:T311686|T311686]]
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31396 and previous config saved to /var/cache/conftool/dbconfig/20220719-110127-marostegui.json
* 11:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netboxdb2001.codfw.wmnet
* 11:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:59 moritzm: draining ganeti2020 [[phab:T310483|T310483]]
* 10:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31395 and previous config saved to /var/cache/conftool/dbconfig/20220719-104622-marostegui.json
* 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31394 and previous config saved to /var/cache/conftool/dbconfig/20220719-104559-ladsgroup.json
* 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31393 and previous config saved to /var/cache/conftool/dbconfig/20220719-104414-marostegui.json
* 10:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P31392 and previous config saved to /var/cache/conftool/dbconfig/20220719-103341-root.json
* 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:05 elukey: reboot an-worker1127 - hdfs datanode caused CPU stalls
* 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:50 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb2001.codfw.wmnet
* 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox1001.wikimedia.org
* 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:46 moritzm: draining ganeti2029 [[phab:T310483|T310483]]
* 09:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox1001.wikimedia.org
* 09:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox2001.wikimedia.org
* 09:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 09:34 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:29 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox2001.wikimedia.org
* 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:00 urbanecm: Deployed patch for [[phab:T313205|T313205]]
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2018.codfw.wmnet to cluster codfw and group D
* 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2018.codfw.wmnet to cluster codfw and group D
* 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
* 08:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust db2167:3311 and db2167:3318 weight [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31390 and previous config saved to /var/cache/conftool/dbconfig/20220719-071836-marostegui.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2167:3311 and db2167:3318 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31389 and previous config saved to /var/cache/conftool/dbconfig/20220719-071656-marostegui.json
* 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2084.codfw.wmnet
* 06:56 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:51 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2084.codfw.wmnet
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2084 from dbctl [[phab:T313121|T313121]]', diff saved to https://phabricator.wikimedia.org/P31386 and previous config saved to /var/cache/conftool/dbconfig/20220719-051725-marostegui.json
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
== 2022-07-18 ==
* 23:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1050.eqiad.wmnet
* 23:46 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1050.eqiad.wmnet
* 23:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1049.eqiad.wmnet
* 23:07 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1049.eqiad.wmnet
* 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:36 sbassett: Deployed security fix for [[phab:T309894|T309894]]
* 20:58 ebernhardson: start reindex of all wikis except commonswiki and wikidatawiki in eqiad and codfw cirrus clusters
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:45 urbanecm: UTC late B&C window finished
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:45 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/: {{Gerrit|930ecb76a5a9266d498f40b49ab5ff82c01dbcf5}}: reindex: Detect index type from live mappings (duration: 02m 55s)
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8d1663c93d2ddeb107d5f9b8982a7f4a7b880aba}}: Turn off fixed width in main namespace on Wikisource ( [[phab:T311607|T311607]]) (duration: 02m 41s)
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c258b25e8a47caf9d531f01798d32cd3f9b1605}}: Enable language switching button for logged-out users on non-pilot wikis ([[phab:T312861|T312861]]) (duration: 02m 43s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f99c5331380a8c03f4c447e2f73cb76afca337a2}}: Pin cu_log actor migration to old schema ([[phab:T233004|T233004]]) (duration: 02m 41s)
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|415c4ef44d9bf1abab6942fbbc552990a8e992c8}}: Collapse sidebar by default for anonymous users ([[phab:T287609|T287609]]) (duration: 02m 41s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/resources/src/moment/moment-locale-overrides.js: {{Gerrit|c4d8a217b4ce0a9f7aefaacc032136e7eb058d4d}}: Ensure custom locales for Moment.js overrides, dont change en ([[phab:T313188|T313188]]) (duration: 02m 44s)
* 20:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|76b7cd6379c25175570eeeb2a305de0fd0bc61e5}}: Mentorship: enable the Vue version of the dashboard in test ([[phab:T300532|T300532]]) (duration: 03m 00s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 19:02 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31385 and previous config saved to /var/cache/conftool/dbconfig/20220718-184146-root.json
* 18:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 18:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31384 and previous config saved to /var/cache/conftool/dbconfig/20220718-182642-root.json
* 18:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS bullseye
* 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31382 and previous config saved to /var/cache/conftool/dbconfig/20220718-181138-root.json
* 18:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
* 17:57 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
* 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31381 and previous config saved to /var/cache/conftool/dbconfig/20220718-175634-root.json
* 17:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS bullseye
* 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31380 and previous config saved to /var/cache/conftool/dbconfig/20220718-174130-root.json
* 17:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31379 and previous config saved to /var/cache/conftool/dbconfig/20220718-172626-root.json
* 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31378 and previous config saved to /var/cache/conftool/dbconfig/20220718-171122-root.json
* 16:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31377 and previous config saved to /var/cache/conftool/dbconfig/20220718-165617-root.json
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31376 and previous config saved to /var/cache/conftool/dbconfig/20220718-165455-marostegui.json
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31375 and previous config saved to /var/cache/conftool/dbconfig/20220718-165349-marostegui.json
* 16:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 16:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31374 and previous config saved to /var/cache/conftool/dbconfig/20220718-165329-marostegui.json
* 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31373 and previous config saved to /var/cache/conftool/dbconfig/20220718-163824-marostegui.json
* 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31372 and previous config saved to /var/cache/conftool/dbconfig/20220718-162319-marostegui.json
* 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31371 and previous config saved to /var/cache/conftool/dbconfig/20220718-160813-marostegui.json
* 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31370 and previous config saved to /var/cache/conftool/dbconfig/20220718-160708-marostegui.json
* 16:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31369 and previous config saved to /var/cache/conftool/dbconfig/20220718-160648-marostegui.json
* 15:52 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:814846{{!}} Bumping portals to master (T128546)]] (duration: 02m 59s)
* 15:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31368 and previous config saved to /var/cache/conftool/dbconfig/20220718-155143-marostegui.json
* 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:814846{{!}} Bumping portals to master (T128546)]] (duration: 03m 03s)
* 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:40 ejegg: updated fundraising CiviCRM from {{Gerrit|55bc690b}} to {{Gerrit|b4a7154a}}
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31367 and previous config saved to /var/cache/conftool/dbconfig/20220718-153637-marostegui.json
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31366 and previous config saved to /var/cache/conftool/dbconfig/20220718-152132-marostegui.json
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31365 and previous config saved to /var/cache/conftool/dbconfig/20220718-152026-marostegui.json
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31364 and previous config saved to /var/cache/conftool/dbconfig/20220718-151944-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31363 and previous config saved to /var/cache/conftool/dbconfig/20220718-150439-marostegui.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31362 and previous config saved to /var/cache/conftool/dbconfig/20220718-145909-ladsgroup.json
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2012.codfw.wmnet to cluster codfw and group C
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2012.codfw.wmnet to cluster codfw and group C
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31361 and previous config saved to /var/cache/conftool/dbconfig/20220718-144934-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31360 and previous config saved to /var/cache/conftool/dbconfig/20220718-144404-ladsgroup.json
* 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31359 and previous config saved to /var/cache/conftool/dbconfig/20220718-143428-marostegui.json
* 14:29 Lucas_WMDE: UTC afternoon backport+config window done
* 14:29 lucaswerkmeister-wmde@deploy1002: Finished scap: refresh everything after adding CampaignEvents to extension-list ([[phab:T311752|T311752]], only enabled in Beta so far), just in case (duration: 14m 40s)
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31358 and previous config saved to /var/cache/conftool/dbconfig/20220718-142859-ladsgroup.json
* 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: refresh everything after adding CampaignEvents to extension-list ([[phab:T311752|T311752]], only enabled in Beta so far), just in case
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31357 and previous config saved to /var/cache/conftool/dbconfig/20220718-141354-ladsgroup.json
* 14:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:813991{{!}}Load and configure the CampaignEvents extension where enabled (T311752)]] (2/2: should be prod no-op) (duration: 02m 40s)
* 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31356 and previous config saved to /var/cache/conftool/dbconfig/20220718-140947-ladsgroup.json
* 14:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31355 and previous config saved to /var/cache/conftool/dbconfig/20220718-140926-ladsgroup.json
* 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:813991{{!}}Load and configure the CampaignEvents extension where enabled (T311752)]] (1/2: should be no-op) (duration: 02m 51s)
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:813990{{!}}Enable the CampaignEvents extension on beta (T311752)]] (no-op) (duration: 02m 43s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31354 and previous config saved to /var/cache/conftool/dbconfig/20220718-135421-ladsgroup.json
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813989{{!}}Add config variable for the CampaignEvents extension (T311752)]] (no-op) (duration: 02m 55s)
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:813986{{!}}Add CampaignEvents to extension-list (T311752)]] (duration: 03m 08s)
* 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2028.codfw.wmnet to cluster codfw and group A
* 13:45 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2028.codfw.wmnet to cluster codfw and group A
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2018.codfw.wmnet with OS bullseye
* 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31353 and previous config saved to /var/cache/conftool/dbconfig/20220718-133916-ladsgroup.json
* 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31352 and previous config saved to /var/cache/conftool/dbconfig/20220718-133414-marostegui.json
* 13:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31351 and previous config saved to /var/cache/conftool/dbconfig/20220718-133354-marostegui.json
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2028.codfw.wmnet
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814111{{!}}Make weighted_tags search default for commonswiki]] (duration: 02m 54s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31350 and previous config saved to /var/cache/conftool/dbconfig/20220718-132411-ladsgroup.json
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
* 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31349 and previous config saved to /var/cache/conftool/dbconfig/20220718-132009-ladsgroup.json
* 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31348 and previous config saved to /var/cache/conftool/dbconfig/20220718-131949-ladsgroup.json
* 13:19 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: Backport: [[gerrit:814767{{!}}Use getOption to detect user preferences (T313209)]] (duration: 02m 50s)
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31347 and previous config saved to /var/cache/conftool/dbconfig/20220718-131848-marostegui.json
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 13:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814108{{!}}Update config for commons custommatch search]] (duration: 02m 55s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31346 and previous config saved to /var/cache/conftool/dbconfig/20220718-130443-ladsgroup.json
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31345 and previous config saved to /var/cache/conftool/dbconfig/20220718-130343-marostegui.json
* 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2018.codfw.wmnet with OS bullseye
* 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31344 and previous config saved to /var/cache/conftool/dbconfig/20220718-124938-ladsgroup.json
* 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2012.codfw.wmnet with OS bullseye
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31343 and previous config saved to /var/cache/conftool/dbconfig/20220718-124838-marostegui.json
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31342 and previous config saved to /var/cache/conftool/dbconfig/20220718-124732-marostegui.json
* 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31341 and previous config saved to /var/cache/conftool/dbconfig/20220718-124712-marostegui.json
* 12:35 godog: update grafana to 8.5.9
* 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31340 and previous config saved to /var/cache/conftool/dbconfig/20220718-123433-ladsgroup.json
* 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31339 and previous config saved to /var/cache/conftool/dbconfig/20220718-123207-marostegui.json
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31338 and previous config saved to /var/cache/conftool/dbconfig/20220718-123029-ladsgroup.json
* 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31337 and previous config saved to /var/cache/conftool/dbconfig/20220718-123009-ladsgroup.json
* 12:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31336 and previous config saved to /var/cache/conftool/dbconfig/20220718-121702-marostegui.json
* 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31335 and previous config saved to /var/cache/conftool/dbconfig/20220718-121504-ladsgroup.json
* 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS bullseye
* 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS bullseye
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31334 and previous config saved to /var/cache/conftool/dbconfig/20220718-120157-marostegui.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31333 and previous config saved to /var/cache/conftool/dbconfig/20220718-120051-marostegui.json
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31332 and previous config saved to /var/cache/conftool/dbconfig/20220718-120030-marostegui.json
* 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31331 and previous config saved to /var/cache/conftool/dbconfig/20220718-115959-ladsgroup.json
* 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31330 and previous config saved to /var/cache/conftool/dbconfig/20220718-114525-marostegui.json
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31329 and previous config saved to /var/cache/conftool/dbconfig/20220718-114454-ladsgroup.json
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31328 and previous config saved to /var/cache/conftool/dbconfig/20220718-113947-ladsgroup.json
* 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31327 and previous config saved to /var/cache/conftool/dbconfig/20220718-113927-ladsgroup.json
* 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS bullseye
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31326 and previous config saved to /var/cache/conftool/dbconfig/20220718-113020-marostegui.json
* 11:25 jbond: re-enable puppet post postgresql re-sync
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31325 and previous config saved to /var/cache/conftool/dbconfig/20220718-112422-ladsgroup.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31324 and previous config saved to /var/cache/conftool/dbconfig/20220718-111515-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31323 and previous config saved to /var/cache/conftool/dbconfig/20220718-111409-marostegui.json
* 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31322 and previous config saved to /var/cache/conftool/dbconfig/20220718-111348-marostegui.json
* 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31319 and previous config saved to /var/cache/conftool/dbconfig/20220718-110916-ladsgroup.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31318 and previous config saved to /var/cache/conftool/dbconfig/20220718-105843-marostegui.json
* 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31317 and previous config saved to /var/cache/conftool/dbconfig/20220718-105411-ladsgroup.json
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31316 and previous config saved to /var/cache/conftool/dbconfig/20220718-104921-ladsgroup.json
* 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31315 and previous config saved to /var/cache/conftool/dbconfig/20220718-104844-ladsgroup.json
* 10:48 jbond: disable puppet fleet wide to resync db
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31314 and previous config saved to /var/cache/conftool/dbconfig/20220718-104337-marostegui.json
* 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31313 and previous config saved to /var/cache/conftool/dbconfig/20220718-103339-ladsgroup.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31312 and previous config saved to /var/cache/conftool/dbconfig/20220718-102832-marostegui.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31311 and previous config saved to /var/cache/conftool/dbconfig/20220718-102726-marostegui.json
* 10:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 10:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31310 and previous config saved to /var/cache/conftool/dbconfig/20220718-102706-marostegui.json
* 10:26 Amir1: dbmaint on s5@eqiad ([[phab:T312863|T312863]])
* 10:26 Amir1: dbmaint on s5@codfw ([[phab:T312863|T312863]])
* 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 10:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31308 and previous config saved to /var/cache/conftool/dbconfig/20220718-101834-ladsgroup.json
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31307 and previous config saved to /var/cache/conftool/dbconfig/20220718-101201-marostegui.json