You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(krinkle@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/: I8619ab9e92b, T277362, T275531 (duration: 00m 58s))
imported>Stashbot
(robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye)
 
(516 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-03-16 ==
== 2022-09-30 ==
* 23:56 krinkle@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/: {{Gerrit|I8619ab9e92b}}, [[phab:T277362|T277362]], [[phab:T275531|T275531]] (duration: 00m 58s)
* 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 23:51 krinkle@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/Scribunto/: {{Gerrit|I84e8732d8d}} - tmp logging (duration: 00m 58s)
* 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 23:47 Krinkle: There is an uncommitted dirty diff in /srv/mediawiki-staging/php-1.36.0-wmf.34/extensions/WikimediaMaintenance/createExtensionTables.php
* 23:31 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I1ca4f30c2}}, [[phab:T262612|T262612]] (duration: 00m 57s)
* 23:22 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Icd6635cb302cc}}, [[phab:T277332|T277332]] (duration: 00m 58s)
* 23:07 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I8d8c94d95c6}} (duration: 00m 59s)
* 23:03 twentyafterfour: applied hotfix to phabricator/src/infrastructure/customfield/storage/PhabricatorCustomFieldStorage.php and restarted php-fpm
* 23:02 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I4097cbcb1d5}} (duration: 00m 59s)
* 22:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ie24eb2077}} (duration: 00m 58s)
* 20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2232.codfw.wmnet
* 20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2231.codfw.wmnet
* 20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2230.codfw.wmnet
* 20:49 andrew@deploy1002: Finished deploy [horizon/deploy@e4fd934]: tiny horizon patch to support flavor deprecation (duration: 03m 44s)
* 20:45 andrew@deploy1002: Started deploy [horizon/deploy@e4fd934]: tiny horizon patch to support flavor deprecation
* 20:15 XioNoX: remove DMZ zone from pfw3-eqiad - [[phab:T174203|T174203]]
* 20:00 brennen: 1.36.0-wmf.35 train status ([[phab:T274939|T274939]]): blocked at group0 on [[phab:T277362|T277362]]
* 19:52 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34
* 19:52 XioNoX: commit changes to pfw3-eqiad - [[phab:T274422|T274422]]
* 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.35
* 19:31 dancy@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.35 (duration: 33m 41s)
* 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2229.codfw.wmnet
* 19:11 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2229.codfw.wmnet
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2229.codfw.wmnet
* 19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2228.codfw.wmnet
* 19:07 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2228.codfw.wmnet
* 19:06 XioNoX: commit changes to pfw3-codfw - [[phab:T274422|T274422]]
* 18:58 dancy@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.35
* 18:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2228.codfw.wmnet
* 18:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:43 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:41 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:03 ppchelko@deploy1002: Finished deploy [restbase/deploy@f99ddaa]: Add new wikis [[phab:T275837|T275837]] [[phab:T271983|T271983]] [[phab:T273466|T273466]] [[phab:T276127|T276127]] [[phab:T273460|T273460]] [[phab:T276249|T276249]] (duration: 31m 31s)
* 17:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster hosts, not in use
* 17:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster hosts, not in use
* 17:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2227.codfw.wmnet
* 17:32 ppchelko@deploy1002: Started deploy [restbase/deploy@f99ddaa]: Add new wikis [[phab:T275837|T275837]] [[phab:T271983|T271983]] [[phab:T273466|T273466]] [[phab:T276127|T276127]] [[phab:T273460|T273460]] [[phab:T276249|T276249]]
* 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2227.codfw.wmnet
* 17:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2226.codfw.wmnet
* 16:47 eevans@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 16:44 eevans@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 16:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2242.codfw.wmnet
* 16:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2241.codfw.wmnet
* 16:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2240.codfw.wmnet
* 16:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2226.codfw.wmnet
* 16:20 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2227.codfw.wmnet
* 16:20 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2226.codfw.wmnet
* 16:17 mutante: testreduce1001 - gzip /var/log/daemon.log.1 ; apt-get clean .. free some disk space
* 15:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16 days, 16:00:00 on acrux.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
* 15:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 16 days, 16:00:00 on acrux.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
* 15:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16 days, 16:00:00 on acrab.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
* 15:46 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 16 days, 16:00:00 on acrab.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14905 and previous config saved to /var/cache/conftool/dbconfig/20210316-153446-root.json
* 15:32 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: [[phab:T277006|T277006]] (duration: 04m 56s)
* 15:27 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: [[phab:T277006|T277006]]
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14904 and previous config saved to /var/cache/conftool/dbconfig/20210316-151943-root.json
* 15:07 hashar@deploy1002: Finished deploy [integration/docroot@cf787a5]: (no justification provided) (duration: 00m 30s)
* 15:06 hashar@deploy1002: Started deploy [integration/docroot@cf787a5]: (no justification provided)
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14903 and previous config saved to /var/cache/conftool/dbconfig/20210316-150439-root.json
* 15:03 hashar@deploy1002: Finished deploy [integration/docroot@44d5685]: Verify check can restart php-fpm # [[phab:T275468|T275468]] (duration: 00m 07s)
* 15:03 hashar@deploy1002: Started deploy [integration/docroot@44d5685]: Verify check can restart php-fpm # [[phab:T275468|T275468]]
* 14:58 Amir1: end of  foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T276251|T276251]] [[phab:T276129|T276129]] [[phab:T275839|T275839]])
* 14:53 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2015.codfw.wmnet
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14902 and previous config saved to /var/cache/conftool/dbconfig/20210316-144935-root.json
* 14:37 Amir1: start of  foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T276251|T276251]] [[phab:T276129|T276129]] [[phab:T275839|T275839]])
* 13:45 moritzm: powercycling ganeti2015, stuck on reboot
* 13:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
* 13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:34 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:33 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:31 moritzm: drain ganeti2015
* 13:31 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 13:31 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 13:30 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P14901 and previous config saved to /var/cache/conftool/dbconfig/20210316-132844-marostegui.json
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14900 and previous config saved to /var/cache/conftool/dbconfig/20210316-132814-root.json
* 13:28 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:27 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
* 13:26 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:24 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|7fb50c3}}: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg ([[phab:T276246|T276246]]; 2/2) (duration: 00m 57s)
* 13:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:23 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|7fb50c3}}: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg ([[phab:T276246|T276246]]; 1/2) (duration: 01m 01s)
* 13:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
* 13:22 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:22 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 13:22 urbanecm@deploy1002: sync-file aborted: {{Gerrit|7fb50c3}}: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg ([[phab:T276246|T276246]]) (duration: 00m 00s)
* 13:20 moritzm: drain ganeti2014
* 13:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
* 13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:18 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
* 13:18 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 13:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:16 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14899 and previous config saved to /var/cache/conftool/dbconfig/20210316-131310-root.json
* 13:13 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:13 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
* 13:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:09 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 13:09 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 13:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:04 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:04 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:03 akosiaris: sync all services on the new codfw kubernetes cluster [[phab:T277191|T277191]]
* 13:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 13:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 12:59 moritzm: drain ganeti2013
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14898 and previous config saved to /var/cache/conftool/dbconfig/20210316-125807-root.json
* 12:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:53 Urbanecm: New wiki creation is done
* 12:51 volans@cumin1001: START - Cookbook sre.dns.netbox
* 12:50 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|1426d04abe08458dac57868a85550e05f9cb544b}}: flaggedrevs: Simplify the config a bit (duration: 00m 58s)
* 12:46 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 06s)
* 12:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating mnwwiktionary ([[phab:T276125|T276125]]) (duration: 00m 57s)
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14897 and previous config saved to /var/cache/conftool/dbconfig/20210316-124303-root.json
* 12:42 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating mnwwiktionary ([[phab:T276125|T276125]]) (duration: 01m 00s)
* 12:41 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating mnwwiktionary ([[phab:T276125|T276125]]) (duration: 01m 01s)
* 12:40 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating mnwwiktionary ([[phab:T276125|T276125]])
* 12:39 urbanecm@deploy1002: Synchronized dblists: Creating mnwwiktionary ([[phab:T276125|T276125]]) (duration: 00m 57s)
* 12:39 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 12:37 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating mnwwiktionary ([[phab:T276125|T276125]]) (duration: 00m 58s)
* 12:36 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating mnwwiktionary ([[phab:T276125|T276125]]) (duration: 00m 58s)
* 12:34 urbanecm@deploy1002: Synchronized langlist: Creating trvwiki ([[phab:T276246|T276246]]) (duration: 00m 58s)
* 12:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating trvwiki ([[phab:T276246|T276246]]) (duration: 00m 57s)
* 12:32 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating trvwiki ([[phab:T276246|T276246]]) (duration: 00m 58s)
* 12:31 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating trvwiki ([[phab:T276246|T276246]])
* 12:29 urbanecm@deploy1002: Synchronized dblists: Creating trvwiki ([[phab:T276246|T276246]]) (duration: 00m 57s)
* 12:28 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 12:28 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating trvwiki ([[phab:T276246|T276246]]) (duration: 01m 02s)
* 12:27 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating trvwiki ([[phab:T276246|T276246]]) (duration: 00m 57s)
* 12:20 urbanecm@deploy1002: Synchronized langlist: Creating taywiki ([[phab:T275803|T275803]]) (duration: 00m 57s)
* 12:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating taywiki ([[phab:T275803|T275803]]) (duration: 00m 58s)
* 12:17 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating taywiki ([[phab:T275803|T275803]]) (duration: 00m 57s)
* 12:17 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 12:16 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating taywiki ([[phab:T275803|T275803]]) (duration: 00m 58s)
* 12:14 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating taywiki ([[phab:T275803|T275803]])
* 12:12 urbanecm@deploy1002: Synchronized dblists: Creating taywiki ([[phab:T275803|T275803]]) (duration: 00m 58s)
* 12:11 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating taywiki ([[phab:T275803|T275803]]) (duration: 01m 02s)
* 12:10 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating taywiki ([[phab:T275803|T275803]]) (duration: 00m 59s)
* 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster host
* 12:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster host
* 12:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 11:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubesvc
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 for schema change', diff saved to https://phabricator.wikimedia.org/P14896 and previous config saved to /var/cache/conftool/dbconfig/20210316-114310-marostegui.json
* 11:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2015.codfw.wmnet
* 11:32 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2016.codfw.wmnet
* 11:32 effie: upgrade memached in mc1023, mc2023
* 11:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2006.codfw.wmnet
* 11:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2016.codfw.wmnet
* 11:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2015.codfw.wmnet
* 11:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14895 and previous config saved to /var/cache/conftool/dbconfig/20210316-112931-root.json
* 11:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes2006.codfw.wmnet
* 11:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
* 11:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c444517}}: {{Gerrit|4e66529}}: {{Gerrit|dff200b}}: Enable DiscussionTools features on several projects ([[phab:T276493|T276493]]; [[phab:T276498|T276498]]; [[phab:T277103|T277103]]) (duration: 00m 57s)
* 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2005.codfw.wmnet
* 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2017.codfw.wmnet
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f0d546502458437ae7b050c3f4bdb5f5a67a9529}}: Enable DiscussionTools beta features on enwiki ([[phab:T273146|T273146]]) (duration: 00m 58s)
* 11:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2005.codfw.wmnet
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14893 and previous config saved to /var/cache/conftool/dbconfig/20210316-111427-root.json
* 11:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|835f9ab9fb107a339e6a9dcc008c9626ba66853e}}: Enable ContentTranslation as a default tool in Amharic, Maltese and Uzbek Wikipedias ([[phab:T276765|T276765]]) (duration: 01m 00s)
* 11:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: REIMAGE
* 11:08 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubemaster,name=.*,cluster=kubernetes
* 11:08 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=kubemaster,name=.*,cluster=kubernetes
* 11:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: REIMAGE
* 11:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: REIMAGE
* 11:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: REIMAGE
* 11:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: REIMAGE
* 11:03 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: REIMAGE
* 11:02 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: REIMAGE
* 11:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: REIMAGE
* 11:00 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2004.codfw.wmnet with reason: REIMAGE
* 10:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2017.codfw.wmnet
* 10:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: REIMAGE
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14892 and previous config saved to /var/cache/conftool/dbconfig/20210316-105924-root.json
* 10:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: REIMAGE
* 10:58 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: REIMAGE
* 10:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: REIMAGE
* 10:55 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: REIMAGE
* 10:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: REIMAGE
* 10:55 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2004.codfw.wmnet with reason: REIMAGE
* 10:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: REIMAGE
* 10:53 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2003.codfw.wmnet with reason: REIMAGE
* 10:52 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: REIMAGE
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14891 and previous config saved to /var/cache/conftool/dbconfig/20210316-105128-root.json
* 10:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2002.codfw.wmnet with reason: REIMAGE
* 10:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2003.codfw.wmnet with reason: REIMAGE
* 10:49 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2006.codfw.wmnet
* 10:49 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2005.codfw.wmnet
* 10:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2001.codfw.wmnet with reason: REIMAGE
* 10:49 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2002.codfw.wmnet with reason: REIMAGE
* 10:47 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2001.codfw.wmnet with reason: REIMAGE
* 10:47 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2015.codfw.wmnet
* 10:46 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2016.codfw.wmnet
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14890 and previous config saved to /var/cache/conftool/dbconfig/20210316-104420-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14889 and previous config saved to /var/cache/conftool/dbconfig/20210316-103625-root.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 60%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14887 and previous config saved to /var/cache/conftool/dbconfig/20210316-102121-root.json
* 10:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
* 10:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
* 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14886 and previous config saved to /var/cache/conftool/dbconfig/20210316-100617-root.json
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 10:03 moritzm: drain ganeti2012
* 10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
* 09:59 akosiaris: Push new certs for kubemaster.svc.codfw.wmnet - [[phab:T277191|T277191]]
* 09:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 49%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14885 and previous config saved to /var/cache/conftool/dbconfig/20210316-095113-root.json
* 09:50 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2006.codfw.wmnet
* 09:48 moritzm: drain ganeti2011
* 09:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2005.codfw.wmnet
* 09:46 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2006.codfw.wmnet
* 09:44 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2005.codfw.wmnet
* 09:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2004.codfw.wmnet
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P14884 and previous config saved to /var/cache/conftool/dbconfig/20210316-094117-marostegui.json
* 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2004.codfw.wmnet
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14883 and previous config saved to /var/cache/conftool/dbconfig/20210316-093609-root.json
* 09:34 akosiaris: poweroff acrux and acrab [[phab:T277191|T277191]]
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14881 and previous config saved to /var/cache/conftool/dbconfig/20210316-092204-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 20%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14880 and previous config saved to /var/cache/conftool/dbconfig/20210316-092106-root.json
* 09:18 akosiaris: switch restbase-async to eqiad since the kubernetes codfw cluster is being reinitialized and it makes little sense to have it there while the callers will run in eqiad only
* 09:15 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
* 09:12 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=restbase-async
* 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wikifeeds
* 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=termbox
* 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=similar-users
* 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=sessionstore
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=recommendation-api
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=push-notifications
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=proton
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mobileapps
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=linkrecommendation
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventstreams-internal
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventstreams
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
* 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-logging-external
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-analytics-external
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-analytics
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=echostore
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=cxserver
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=blubberoid
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=api-gateway
* 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=apertium
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14879 and previous config saved to /var/cache/conftool/dbconfig/20210316-090701-root.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 15%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14878 and previous config saved to /var/cache/conftool/dbconfig/20210316-090602-root.json
* 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:05 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:05 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 08:59 akosiaris: starting the k8s codfw cluster reinitialization process
* 08:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize codfw k8s cluster with new etcd
* 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize codfw k8s cluster with new etcd
* 08:57 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 08:56 jayme@cumin1001: START - Cookbook sre.discovery.service-route
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14877 and previous config saved to /var/cache/conftool/dbconfig/20210316-085157-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14876 and previous config saved to /var/cache/conftool/dbconfig/20210316-085058-root.json
* 08:47 marostegui: Check tables on db2150 db2120 [[phab:T276742|T276742]]
* 08:42 moritzm: remove Java 8 from contint/releases [[phab:T269354|T269354]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14875 and previous config saved to /var/cache/conftool/dbconfig/20210316-083653-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14874 and previous config saved to /var/cache/conftool/dbconfig/20210316-083555-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 2%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14873 and previous config saved to /var/cache/conftool/dbconfig/20210316-082051-root.json
* 08:18 godog: enable nick enforcing for logmsgbot - [[phab:T276303|T276303]]
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14872 and previous config saved to /var/cache/conftool/dbconfig/20210316-080547-root.json
* 07:51 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - [[phab:T272836|T272836]]
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14871 and previous config saved to /var/cache/conftool/dbconfig/20210316-072910-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14870 and previous config saved to /var/cache/conftool/dbconfig/20210316-071407-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14869 and previous config saved to /var/cache/conftool/dbconfig/20210316-065903-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P14868 and previous config saved to /var/cache/conftool/dbconfig/20210316-065840-marostegui.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108', diff saved to https://phabricator.wikimedia.org/P14867 and previous config saved to /var/cache/conftool/dbconfig/20210316-065814-marostegui.json
* 06:52 marostegui: Stop MySQL on db2120 to clone db2150 - [[phab:T275633|T275633]]
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14865 and previous config saved to /var/cache/conftool/dbconfig/20210316-065148-marostegui.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14864 and previous config saved to /var/cache/conftool/dbconfig/20210316-064358-root.json
* 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
* 05:35 marostegui: Stop MySQL on db1162 to clone db1162 [[phab:T258361|T258361]]
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P14862 and previous config saved to /var/cache/conftool/dbconfig/20210316-053516-marostegui.json


== 2021-03-15 ==
== 2022-09-29 ==
* 23:31 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove back-compat from when IRC feed servers was a string ([[phab:T224579|T224579]]) (duration: 00m 59s)
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 23:24 legoktm@deploy1002: Synchronized wmf-config/: Define IRC feed servers as an array in <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php ([[phab:T224579|T224579]]) (duration: 00m 59s)
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 23:23 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Support having multiple IRC feed servers ([
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance


== 2021-03-14 ==
== 2022-09-28 ==
* 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14827 and previous config saved to /var/cache/conftool/dbconfig/20210314-175751-root.json
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
* 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14826 and previous config saved to /var/cache/conftool/dbconfig/20210314-174248-root.json
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14825 and previous config saved to /var/cache/conftool/dbconfig/20210314-172744-root.json
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14824 and previous config saved to /var/cache/conftool/dbconfig/20210314-171240-root.json
* 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
* 14:43 gehel: depool wdqs1005 and restart blazegraph - will keep depooled until this server has catched up on lag
* 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:20 ejegg: updated fundraising CiviCRM from {{Gerrit|d31c19a0}} to {{Gerrit|f3461a44}}
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
* 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35100 and previous config saved to /var/cache/conftool/dbconfig/20220928-212131-ladsgroup.json
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35099 and previous config saved to /var/cache/conftool/dbconfig/20220928-210624-ladsgroup.json
* 21:06 volans: installed spicerack 4.0.0-1+deb11u1 on cumin1001
* 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35098 and previous config saved to /var/cache/conftool/dbconfig/20220928-205117-ladsgroup.json
* 20:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
* 20:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
* 20:39 TheresNoTime: closing UTC late backport window
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] (duration: 06m 19s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:18 samtar@deploy1002: samtar and essexigyan: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:18 samtar@deploy1002: Started scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]]
* 20:11 samtar@deploy1002: Sync cancelled.
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 20:04 samtar@deploy1002: samtar and dani: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]]
* 19:24 ejegg: updated fundraising CiviCRM from {{Gerrit|916a8b08}} to {{Gerrit|d31c19a0}}
* 19:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 18:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:22 volans: installed spicerack 4.0.0-1+deb11u1 on cumin2002
* 18:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3f23a1b]: (no justification provided) (duration: 00m 11s)
* 18:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@3f23a1b]: (no justification provided)
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 03m 38s)
* 18:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19653
* 17:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19653
* 17:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
* 17:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4181
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4181
* 17:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35097 and previous config saved to /var/cache/conftool/dbconfig/20220928-171848-ladsgroup.json
* 17:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35096 and previous config saved to /var/cache/conftool/dbconfig/20220928-170342-ladsgroup.json
* 16:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
* 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35095 and previous config saved to /var/cache/conftool/dbconfig/20220928-164835-ladsgroup.json
* 16:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
* 16:36 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@f89d689]: (no justification provided) (duration: 00m 12s)
* 16:36 nokafor@deploy1002: Started deploy [airflow-dags/analytics@f89d689]: (no justification provided)
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
* 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35093 and previous config saved to /var/cache/conftool/dbconfig/20220928-163329-ladsgroup.json
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:26 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4775
* 16:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4775
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2635
* 16:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2635
* 16:15 volans: uploaded spicerack_4.0.0 to apt.wikimedia.org bullseye-wikimedia
* 15:57 dancy@deploy1002: Installation of scap version "4.24.0" completed for 561 hosts
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:57 dancy@deploy1002: Installing scap version "4.24.0" for 561 hosts
* 15:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40217
* 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40217
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
* 15:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
* 15:51 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0646be1]: (no justification provided) (duration: 00m 10s)
* 15:51 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0646be1]: (no justification provided)
* 15:47 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS buster
* 15:26 moritzm: installing libgoogle-gson-java security updates on bullseye
* 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4922
* 15:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4922
* 15:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
* 15:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
* 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19108
* 15:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19108
* 15:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:09 moritzm: installing twisted security updates
* 15:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8674
* 15:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8674
* 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35092 and previous config saved to /var/cache/conftool/dbconfig/20220928-150230-ladsgroup.json
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35091 and previous config saved to /var/cache/conftool/dbconfig/20220928-150158-ladsgroup.json
* 15:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:00 SandraEbele: deploying Airflow for hdfsarchiver operator fix
* 15:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@aa7984f]: (no justification provided) (duration: 00m 14s)
* 15:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@aa7984f]: (no justification provided)
* 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1005.eqiad.wmnet with OS bullseye
* 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
* 14:53 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
* 14:48 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
* 14:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53334
* 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35090 and previous config saved to /var/cache/conftool/dbconfig/20220928-144651-ladsgroup.json
* 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 53334
* 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46450
* 14:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46450
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
* 14:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22987
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35089 and previous config saved to /var/cache/conftool/dbconfig/20220928-143145-ladsgroup.json
* 14:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22987
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21949
* 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1005.eqiad.wmnet with OS bullseye
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21949
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19108
* 14:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19108
* 14:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15695
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15695
* 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13335
* 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35088 and previous config saved to /var/cache/conftool/dbconfig/20220928-141638-ladsgroup.json
* 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13335
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
* 14:12 volans: added python3-gjson v0.0.5 to apt.w.o (bullseye only)
* 14:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10310
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
* 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 14:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8781
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35087 and previous config saved to /var/cache/conftool/dbconfig/20220928-141007-root.json
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35086 and previous config saved to /var/cache/conftool/dbconfig/20220928-141001-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35085 and previous config saved to /var/cache/conftool/dbconfig/20220928-140956-root.json
* 14:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8781
* 14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35084 and previous config saved to /var/cache/conftool/dbconfig/20220928-140950-root.json
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
* 14:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8359
* 14:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudrabbit1003.wikimedia.org
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8359
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8075
* 14:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-eqiad
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8075
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7195
* 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7195
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6762
* 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6762
* 14:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6614
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6614
* 14:02 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6079
* 14:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6079
* 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4230
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4230
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35083 and previous config saved to /var/cache/conftool/dbconfig/20220928-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35082 and previous config saved to /var/cache/conftool/dbconfig/20220928-135456-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35081 and previous config saved to /var/cache/conftool/dbconfig/20220928-135451-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35080 and previous config saved to /var/cache/conftool/dbconfig/20220928-135445-root.json
* 13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
* 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3300
* 13:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:52 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 13:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3300
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
* 13:50 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3292
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2906
* 13:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 13:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2906
* 13:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2603
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2603
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
* 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35079 and previous config saved to /var/cache/conftool/dbconfig/20220928-133957-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35078 and previous config saved to /var/cache/conftool/dbconfig/20220928-133951-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35077 and previous config saved to /var/cache/conftool/dbconfig/20220928-133946-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35076 and previous config saved to /var/cache/conftool/dbconfig/20220928-133940-root.json
* 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=1) rolling restart_daemons on A:thanos-fe-codfw
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 577
* 13:32 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 577
* 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
* 13:31 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35075 and previous config saved to /var/cache/conftool/dbconfig/20220928-132452-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35074 and previous config saved to /var/cache/conftool/dbconfig/20220928-132446-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35073 and previous config saved to /var/cache/conftool/dbconfig/20220928-132442-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35072 and previous config saved to /var/cache/conftool/dbconfig/20220928-132435-root.json
* 13:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:15 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35071 and previous config saved to /var/cache/conftool/dbconfig/20220928-130947-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35070 and previous config saved to /var/cache/conftool/dbconfig/20220928-130941-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35069 and previous config saved to /var/cache/conftool/dbconfig/20220928-130937-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35068 and previous config saved to /var/cache/conftool/dbconfig/20220928-130930-root.json
* 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35067 and previous config saved to /var/cache/conftool/dbconfig/20220928-125442-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35066 and previous config saved to /var/cache/conftool/dbconfig/20220928-125436-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35065 and previous config saved to /var/cache/conftool/dbconfig/20220928-125432-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35064 and previous config saved to /var/cache/conftool/dbconfig/20220928-125425-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35063 and previous config saved to /var/cache/conftool/dbconfig/20220928-123937-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35062 and previous config saved to /var/cache/conftool/dbconfig/20220928-123932-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35061 and previous config saved to /var/cache/conftool/dbconfig/20220928-123927-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35060 and previous config saved to /var/cache/conftool/dbconfig/20220928-123920-root.json
* 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35058 and previous config saved to /var/cache/conftool/dbconfig/20220928-122432-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35057 and previous config saved to /var/cache/conftool/dbconfig/20220928-122427-root.json
* 12:24 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C thirdparty/elastic710 copy buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35056 and previous config saved to /var/cache/conftool/dbconfig/20220928-122422-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35055 and previous config saved to /var/cache/conftool/dbconfig/20220928-122421-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35054 and previous config saved to /var/cache/conftool/dbconfig/20220928-122415-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35053 and previous config saved to /var/cache/conftool/dbconfig/20220928-122414-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35052 and previous config saved to /var/cache/conftool/dbconfig/20220928-122411-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35051 and previous config saved to /var/cache/conftool/dbconfig/20220928-122403-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35050 and previous config saved to /var/cache/conftool/dbconfig/20220928-122356-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35049 and previous config saved to /var/cache/conftool/dbconfig/20220928-122350-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35048 and previous config saved to /var/cache/conftool/dbconfig/20220928-122346-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P35047 and previous config saved to /var/cache/conftool/dbconfig/20220928-122321-root.json
* 12:22 gehel: above reprepro copy failed, elastic710 component does not exist yet
* 12:21 XioNoX: re-enable Init7 in knams
* 12:21 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C elastic710 buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 db2146 db2122 es2022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P35046 and previous config saved to /var/cache/conftool/dbconfig/20220928-121912-root.json
* 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 12:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35045 and previous config saved to /var/cache/conftool/dbconfig/20220928-120916-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35044 and previous config saved to /var/cache/conftool/dbconfig/20220928-120909-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35043 and previous config saved to /var/cache/conftool/dbconfig/20220928-120906-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35042 and previous config saved to /var/cache/conftool/dbconfig/20220928-120858-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35041 and previous config saved to /var/cache/conftool/dbconfig/20220928-120852-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35040 and previous config saved to /var/cache/conftool/dbconfig/20220928-120845-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35039 and previous config saved to /var/cache/conftool/dbconfig/20220928-120841-root.json
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 11:58 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35038 and previous config saved to /var/cache/conftool/dbconfig/20220928-115411-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35037 and previous config saved to /var/cache/conftool/dbconfig/20220928-115404-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35036 and previous config saved to /var/cache/conftool/dbconfig/20220928-115401-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35035 and previous config saved to /var/cache/conftool/dbconfig/20220928-115354-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35034 and previous config saved to /var/cache/conftool/dbconfig/20220928-115347-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35033 and previous config saved to /var/cache/conftool/dbconfig/20220928-115340-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35032 and previous config saved to /var/cache/conftool/dbconfig/20220928-115336-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35031 and previous config saved to /var/cache/conftool/dbconfig/20220928-113906-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35030 and previous config saved to /var/cache/conftool/dbconfig/20220928-113900-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35029 and previous config saved to /var/cache/conftool/dbconfig/20220928-113856-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35028 and previous config saved to /var/cache/conftool/dbconfig/20220928-113849-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35027 and previous config saved to /var/cache/conftool/dbconfig/20220928-113842-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35026 and previous config saved to /var/cache/conftool/dbconfig/20220928-113835-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35025 and previous config saved to /var/cache/conftool/dbconfig/20220928-113831-root.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35024 and previous config saved to /var/cache/conftool/dbconfig/20220928-112401-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35023 and previous config saved to /var/cache/conftool/dbconfig/20220928-112355-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35022 and previous config saved to /var/cache/conftool/dbconfig/20220928-112351-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35021 and previous config saved to /var/cache/conftool/dbconfig/20220928-112344-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35020 and previous config saved to /var/cache/conftool/dbconfig/20220928-112337-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35019 and previous config saved to /var/cache/conftool/dbconfig/20220928-112330-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35018 and previous config saved to /var/cache/conftool/dbconfig/20220928-112326-root.json
* 11:18 moritzm: installing expat security updates
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35017 and previous config saved to /var/cache/conftool/dbconfig/20220928-110856-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35016 and previous config saved to /var/cache/conftool/dbconfig/20220928-110850-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35015 and previous config saved to /var/cache/conftool/dbconfig/20220928-110846-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35014 and previous config saved to /var/cache/conftool/dbconfig/20220928-110839-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35013 and previous config saved to /var/cache/conftool/dbconfig/20220928-110832-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35012 and previous config saved to /var/cache/conftool/dbconfig/20220928-110825-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35011 and previous config saved to /var/cache/conftool/dbconfig/20220928-110821-root.json
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35010 and previous config saved to /var/cache/conftool/dbconfig/20220928-105531-ladsgroup.json
* 10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35009 and previous config saved to /var/cache/conftool/dbconfig/20220928-105520-ladsgroup.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35008 and previous config saved to /var/cache/conftool/dbconfig/20220928-105351-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35007 and previous config saved to /var/cache/conftool/dbconfig/20220928-105345-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35006 and previous config saved to /var/cache/conftool/dbconfig/20220928-105340-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35005 and previous config saved to /var/cache/conftool/dbconfig/20220928-105332-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35004 and previous config saved to /var/cache/conftool/dbconfig/20220928-105327-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35003 and previous config saved to /var/cache/conftool/dbconfig/20220928-105320-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35002 and previous config saved to /var/cache/conftool/dbconfig/20220928-105315-root.json
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P35001 and previous config saved to /var/cache/conftool/dbconfig/20220928-104014-ladsgroup.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35000 and previous config saved to /var/cache/conftool/dbconfig/20220928-103847-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34999 and previous config saved to /var/cache/conftool/dbconfig/20220928-103840-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34998 and previous config saved to /var/cache/conftool/dbconfig/20220928-103835-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34997 and previous config saved to /var/cache/conftool/dbconfig/20220928-103827-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34996 and previous config saved to /var/cache/conftool/dbconfig/20220928-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34995 and previous config saved to /var/cache/conftool/dbconfig/20220928-103815-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34994 and previous config saved to /var/cache/conftool/dbconfig/20220928-103810-root.json
* 10:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 db1137 db1168 db1143 db1132 db1127 es1022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P34993 and previous config saved to /var/cache/conftool/dbconfig/20220928-102759-root.json
* 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P34992 and previous config saved to /var/cache/conftool/dbconfig/20220928-102508-ladsgroup.json
* 10:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 10:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34990 and previous config saved to /var/cache/conftool/dbconfig/20220928-101001-ladsgroup.json
* 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59689
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59689
* 08:49 jbond: disable puppet on cache serveres to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/832268
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34989 and previous config saved to /var/cache/conftool/dbconfig/20220928-084557-ladsgroup.json
* 08:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34988 and previous config saved to /var/cache/conftool/dbconfig/20220928-084535-ladsgroup.json
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34987 and previous config saved to /var/cache/conftool/dbconfig/20220928-083029-ladsgroup.json
* 08:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34985 and previous config saved to /var/cache/conftool/dbconfig/20220928-081522-ladsgroup.json
* 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34984 and previous config saved to /var/cache/conftool/dbconfig/20220928-080015-ladsgroup.json
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:30 XioNoX: disable BGP to init7 in knams
* 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] (duration: 05m 17s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:03 kartik@deploy1002: Started scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]]
* 06:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34981 and previous config saved to /var/cache/conftool/dbconfig/20220928-043052-ladsgroup.json
* 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34980 and previous config saved to /var/cache/conftool/dbconfig/20220928-043030-ladsgroup.json
* 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34979 and previous config saved to /var/cache/conftool/dbconfig/20220928-041524-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34978 and previous config saved to /var/cache/conftool/dbconfig/20220928-040017-ladsgroup.json
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34977 and previous config saved to /var/cache/conftool/dbconfig/20220928-034511-ladsgroup.json
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34976 and previous config saved to /var/cache/conftool/dbconfig/20220928-020746-ladsgroup.json
* 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34975 and previous config saved to /var/cache/conftool/dbconfig/20220928-020724-ladsgroup.json
* 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34974 and previous config saved to /var/cache/conftool/dbconfig/20220928-015218-ladsgroup.json
* 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34973 and previous config saved to /var/cache/conftool/dbconfig/20220928-013711-ladsgroup.json
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
* 01:18 ejegg: updated fundraising python tools from {{Gerrit|b65109af}} to {{Gerrit|dd494413}}
* 00:34 eileen: civicrm upgraded from {{Gerrit|118c1d0b}} to {{Gerrit|916a8b08}}
* 00:11 eileen: civicrm upgraded from {{Gerrit|e198fb4c}} to {{Gerrit|118c1d0b}}


== 2021-03-13 ==
== 2022-09-27 ==
* 19:02 Amir1: change default charset of all core tables in labstestwiki to binary ([[phab:T269348|T269348]])
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 18:53 Amir1: run schema changes for varbinary on wikitech ([[phab:T269348|T269348]])
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 17:38 twentyafterfour: restarted apache on gerrit1001 to resolve apache worker exhaustion see [[phab:T277127|T277127]]
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 16:57 Reedy: gerrit web interface is slow/timing out
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 01:18 ryankemper: [[phab:T266470|T266470]] Re-enabled icinga service notifications for `Check no envoy runtime configuration is left persistent` on `wdqs100[9,10]`
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 01:04 ryankemper: [[phab:T266470|T266470]] merged https://gerrit.wikimedia.org/r/c/operations/dns/+/668255 && `ryankemper@authdns1001:~$ sudo authdns-update`
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 00:55 mutante: [wdqs1009:/etc/envoy] $ sudo /usr/local/sbin/build-envoy-config -c /etc/envoy/
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:12 TheresNoTime: closing UTC late backport window
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 20:59 TheresNoTime: extending UTC late backport window
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2021-03-12 ==
== 2022-09-26 ==
* 22:53 ryankemper: [[phab:T266470|T266470]] Manually disabled service notifications for `Check no envoy runtime configuration is left persistent`, will need to circle back on Monday to restore notifications
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 22:10 legoktm: imported mailman-puppetmaster.mailman.eqiad1.wikimedia.cloud facts to puppet-compiler
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 21:52 mutante: puppetmaster1001  sudo puppet cert clean testreduce.discovery.wmnet ([[phab:T266509|T266509]])
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 21:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2219.codfw.wmnet
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2219.codfw.wmnet
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2218.codfw.wmnet
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 20:32 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2218.codfw.wmnet
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2217.codfw.wmnet
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 20:22 eevans@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2217.codfw.wmnet
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2219.codfw.wmnet
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2218.codfw.wmnet
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2217.codfw.wmnet
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2376.codfw.wmnet,service=canary
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2374.codfw.wmnet,service=canary
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:47 ebernhardson: start in-place reindex testwiki in eqiad, codfw, cloudelastic cirrus clusters for [[phab:T269493|T269493]]
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:41 mutante: mw2374, mw2376 - depooling to turn them into canaries
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
* 20:31 TheresNoTime: closing UTC late backport window
* 19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 19:09 cstone: tools revision changed from {{Gerrit|532f8ecb33}} to {{Gerrit|b7b4060c30}}
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:28 bblack: authdns1001.wikimedia.org,dns2001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:24 bblack: dns[15]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:21 bblack: dns[34]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:03 mutante: depooling mw2244,mw2245 (API on old hardware), mw2229,mw2230 (app on old hardware) - [[phab:T277119|T277119]]
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 18:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2245.codfw.wmnet
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 18:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2244.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 14:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 14:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14818 and previous config saved to /var/cache/conftool/dbconfig/20210312-143450-root.json
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14817 and previous config saved to /var/cache/conftool/dbconfig/20210312-141947-root.json
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14816 and previous config saved to /var/cache/conftool/dbconfig/20210312-140443-root.json
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14815 and previous config saved to /var/cache/conftool/dbconfig/20210312-134940-root.json
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 13:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1088.eqiad.wmnet
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1088.eqiad.wmnet
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P14814 and previous config saved to /var/cache/conftool/dbconfig/20210312-131033-marostegui.json
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 12:12 vgutierrez: restart ats-tls on cp3051
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 11:55 effie: upgrade memcached on mc1022, mc2022
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 11:22 hnowlan: corrected git_server for logstash-logback-encoder, cassandra/twcs and cassandra/metrics-collector on deploy1002
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 09:45 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 09:45 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 09:44 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 09:43 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 09:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 09:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 09:07 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling (duration: 01m 35s)
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 09:05 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 09:00 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling (duration: 00m 09s)
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 09:00 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 08:59 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling (duration: 00m 10s)
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 08:59 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 08:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 08:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 08:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 08:01 moritzm: installing openjpeg2 security updates
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 07:16 marostegui: Stop mysql on db2108 to clone db2148
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14811 and previous config saved to /var/cache/conftool/dbconfig/20210312-071628-marostegui.json
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14810 and previous config saved to /var/cache/conftool/dbconfig/20210312-071400-root.json
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14809 and previous config saved to /var/cache/conftool/dbconfig/20210312-070219-marostegui.json
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 60%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14808 and previous config saved to /var/cache/conftool/dbconfig/20210312-065857-root.json
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314 for table checking [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14807 and previous config saved to /var/cache/conftool/dbconfig/20210312-065008-marostegui.json
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 30%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14806 and previous config saved to /var/cache/conftool/dbconfig/20210312-064353-root.json
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 06:30 marostegui: Deploy schema change on s2 codfw master, lag will appear - [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14805 and previous config saved to /var/cache/conftool/dbconfig/20210312-062850-root.json
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P14804 and previous config saved to /var/cache/conftool/dbconfig/20210312-061306-marostegui.json
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1088 from dbctl [[phab:T276025|T276025]]', diff saved to https://phabricator.wikimedia.org/P14803 and previous config saved to /var/cache/conftool/dbconfig/20210312-061118-marostegui.json
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 04:14 eileen: tools revision changed from {{Gerrit|d64b2f8cee}} to {{Gerrit|532f8ecb33}}
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 01:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2215.codfw.wmnet
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 00:58 mutante: shutting down mw2215
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 00:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2215.codfw.wmnet
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-03-11 ==
== 2022-09-25 ==
* 22:55 mutante: depooled mw2224 through mw2228 but not removing from DSH groups yet ([[phab:T277119|T277119]])
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2228.codfw.wmnet
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2227.codfw.wmnet
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2225.codfw.wmnet
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2224.codfw.wmnet
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 22:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 22:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 22:47 mutante: running DNS cookbook in an attempt to remove mw2216
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 22:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2216.codfw.wmnet
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 22:41 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.34
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 22:36 brennen: train status: 1.36.0-wmf.34 ([[phab:T274938|T274938]]): [[phab:T277229|T277229]] and [[phab:T266517|T266517]] related issues hopefully resolved, rolling forward to all wikis
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 22:34 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: [[gerrit:670879{{!}}Do not log script errors without file uri (T266517)]] (duration: 01m 07s)
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 22:30 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/MobileFrontend/includes/: Backport: [[gerrit:670877{{!}}Revert "Fix: Save user options only once when Advanced Mode is toggled" (T277229)]] (duration: 01m 09s)
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 22:28 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 21:57 Amir1: run populate pages in cognate ([[phab:T259360|T259360]])
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 21:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2222.codfw.wmnet
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2223.codfw.wmnet
* 21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2221.codfw.wmnet
* 21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
* 21:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.36.0-wmf.34"
* 21:20 brennen: train status: 1.36.0-wmf.34 ([[phab:T274938|T274938]]): rolling back to group1 and marking [[phab:T277229|T277229]] a train blocker
* 21:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1003.eqiad.wmnet with reason: REIMAGE
* 21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1003.eqiad.wmnet with reason: REIMAGE
* {{safesubst:SAL entry|1=21:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:670858{{!}}Enable GrowthExperiments link recommendations on testwiki (T277173)] (duration: 00m 59s)}}
* 21:13 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@3810277]: [[phab:T273847|T273847]] export queries to relforge dag deployment - correct start date (duration: 01m 53s)
* 21:12 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@3810277]: [[phab:T273847|T273847]] export queries to relforge dag deployment - correct start date
* 21:05 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2216.codfw.wmnet
* 21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mw2215.codfw.wmnet
* 21:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 21:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 21:03 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2215.codfw.wmnet
* 21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw2216.codfw.wmnet with reason: decom
* 21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw2216.codfw.wmnet with reason: decom
* 21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw2215.codfw.wmnet with reason: decom
* 21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw2215.codfw.wmnet with reason: decom
* 21:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 21:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:58 mutante: deactivating codfw API canaries on old hardware ([[phab:T277119|T277119]])
* 20:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2216.codfw.wmnet
* 20:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2215.codfw.wmnet
* 20:50 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:46 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@cc478d4]: [[phab:T273847|T273847]] export queries to relforge dag deployment (duration: 02m 09s)
* 20:44 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@cc478d4]: [[phab:T273847|T273847]] export queries to relforge dag deployment
* 20:35 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 20:33 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 20:28 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 20:20 mutante: phab1001 - systemctl start phabricator_clean_tmp_files - now Succeeded
* 20:17 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
* 20:13 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
* 20:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.34
* 19:59 mutante: phab1001 - sudo systemctl start phabricator_clean_tmp_files (manually run after conversion from cron to timer, and it fails with permission issues)
* 19:55 tgr_: [[phab:T277173|T277173]] running mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki GrowthExperiments
* 19:54 tgr@deploy1002: Synchronized wmf-config/: Config: [[gerrit:670857{{!}}Configure GrowthExperiments Add Link settings, step 2 (T277173)]] (duration: 01m 08s)
* 19:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:30 tgr@deploy1002: Synchronized wmf-config/: Config: [[gerrit:670887{{!}}Configure GrowthExperiments Add Link settings, step 1 (T277173)]] (duration: 01m 08s)
* 19:18 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:668196{{!}}wikitech: enable BetaFeatures (T125941)]] (duration: 01m 08s)
* 19:13 hnowlan@deploy1002: Finished deploy [restbase/deploy@6f0fe23]: Remove internal ratelimits that were causing service proxy issues (duration: 16m 25s)
* 18:56 hnowlan@deploy1002: Started deploy [restbase/deploy@6f0fe23]: Remove internal ratelimits that were causing service proxy issues
* 18:47 tgr_: running mwscript extensions/GrowthExperiments/maintenance/importOresTopics.php testwiki --count 1000 --verbose --wikiId enwiki --apiUrl 'https://en.wikipedia.org/w/api.php'
* 17:31 effie: install mecached 1.6.6-1 on mwdebug1001
* 16:26 effie: upgrade memcached on mc1021, mc2021
* 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14802 and previous config saved to /var/cache/conftool/dbconfig/20210311-161138-root.json
* 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 60%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14801 and previous config saved to /var/cache/conftool/dbconfig/20210311-155635-root.json
* 15:53 cmjohnson1: updating firmware wdqs1009 [[phab:T274751|T274751]]
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 30%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14800 and previous config saved to /var/cache/conftool/dbconfig/20210311-154131-root.json
* 15:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14799 and previous config saved to /var/cache/conftool/dbconfig/20210311-152627-root.json
* 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P14798 and previous config saved to /var/cache/conftool/dbconfig/20210311-151435-marostegui.json
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14797 and previous config saved to /var/cache/conftool/dbconfig/20210311-150707-root.json
* 14:55 klausman: restarting pybal on lvs2009 [[phab:T272918|T272918]]
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 60%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14796 and previous config saved to /var/cache/conftool/dbconfig/20210311-145204-root.json
* 14:50 klausman: restarting pybal on lvs1016 [[phab:T272918|T272918]]
* 14:49 klausman: restarting pybal on lvs2010 [[phab:T272918|T272918]]
* 14:46 moritzm: installing openssl (1.1) security updates for stretch
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 30%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14795 and previous config saved to /var/cache/conftool/dbconfig/20210311-143700-root.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14794 and previous config saved to /var/cache/conftool/dbconfig/20210311-142157-root.json
* 14:07 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14793 and previous config saved to /var/cache/conftool/dbconfig/20210311-140526-marostegui.json
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14792 and previous config saved to /var/cache/conftool/dbconfig/20210311-140328-root.json
* 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2149 into s3', diff saved to https://phabricator.wikimedia.org/P14791 and previous config saved to /var/cache/conftool/dbconfig/20210311-140119-marostegui.json
* 13:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
* 13:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 60%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14790 and previous config saved to /var/cache/conftool/dbconfig/20210311-134825-root.json
* 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
* 13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
* 13:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
* 13:33 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:33 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 30%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14789 and previous config saved to /var/cache/conftool/dbconfig/20210311-133321-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14788 and previous config saved to /var/cache/conftool/dbconfig/20210311-131818-root.json
* 13:04 moritzm: installing openssl1.0 security updates on stretch
* 13:03 arturo: copy python-mwclient 0.8.4-1 from stretch-wikimedia to buster-wikimedia for [[phab:T275865|T275865]]
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P14787 and previous config saved to /var/cache/conftool/dbconfig/20210311-130208-marostegui.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14786 and previous config saved to /var/cache/conftool/dbconfig/20210311-130103-root.json
* 13:00 hnowlan: imported cassandra_2.2.6-wmf5 to buster-wikimedia
* 12:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 60%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14785 and previous config saved to /var/cache/conftool/dbconfig/20210311-124559-root.json
* 12:39 hnowlan: imported cassandra_2.2.6-wmf1 to buster-wikimedia
* 12:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
* 12:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 30%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14783 and previous config saved to /var/cache/conftool/dbconfig/20210311-123056-root.json
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
* 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
* 12:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
* 12:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
* 12:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
* 12:16 Lucas_WMDE: EU backport&config window done
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14782 and previous config saved to /var/cache/conftool/dbconfig/20210311-121552-root.json
* 12:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds 581768,739279,774383,852302 # [[phab:T270249|T270249]], finished in 1.124s
* 12:12 Lucas_WMDE: finished in 1.124s real time
* 12:12 Lucas_WMDE: start of lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds 581768,739279,774383,852302
* 12:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/LabsServices.php: Config: [[gerrit:670566{{!}}Update comment for irc.beta.wmflabs.org (T277081)]] (comment-only beta-only change) (duration: 01m 13s)
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
* 12:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:670180{{!}}Fix obsolete comments on wgCheckUserLogLogins (T253802)]] (duration: 01m 08s)
* 12:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14781 and previous config saved to /var/cache/conftool/dbconfig/20210311-120554-marostegui.json
* 12:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
* 11:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
* 11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
* 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
* 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
* 11:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
* 11:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
* 11:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
* 11:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
* 11:37 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
* 11:35 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 11:34 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
* 11:31 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 11:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14778 and previous config saved to /var/cache/conftool/dbconfig/20210311-112747-root.json
* 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
* 11:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
* 11:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
* 11:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 60%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14777 and previous config saved to /var/cache/conftool/dbconfig/20210311-111243-root.json
* 11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
* 11:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
* 10:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 30%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14776 and previous config saved to /var/cache/conftool/dbconfig/20210311-105740-root.json
* 10:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 10:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14775 and previous config saved to /var/cache/conftool/dbconfig/20210311-104236-root.json
* 10:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 10:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 10:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 10:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14774 and previous config saved to /var/cache/conftool/dbconfig/20210311-101714-marostegui.json
* 10:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 10:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2149 to dbctl, depooled, [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14773 and previous config saved to /var/cache/conftool/dbconfig/20210311-101604-marostegui.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14772 and previous config saved to /var/cache/conftool/dbconfig/20210311-101008-root.json
* 10:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P14771 and previous config saved to /var/cache/conftool/dbconfig/20210311-100705-marostegui.json
* 10:00 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 60%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14770 and previous config saved to /var/cache/conftool/dbconfig/20210311-095504-root.json
* 09:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
* 09:45 marostegui: Deploy schema change on s5 codfw master, lag will appear - [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 09:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 30%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14769 and previous config saved to /var/cache/conftool/dbconfig/20210311-094000-root.json
* 09:35 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
* 09:31 hashar: Restarting CI Jenkins
* 09:29 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14768 and previous config saved to /var/cache/conftool/dbconfig/20210311-092457-root.json
* 09:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
* 09:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
* 09:19 effie: upgrade memcached on mc1020, mc2020
* 09:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
* 09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
* 09:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P14767 and previous config saved to /var/cache/conftool/dbconfig/20210311-090342-marostegui.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14766 and previous config saved to /var/cache/conftool/dbconfig/20210311-090312-root.json
* 09:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
* 08:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
* 08:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1032.eqiad.wmnet
* 08:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1032.eqiad.wmnet
* 08:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 60%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14765 and previous config saved to /var/cache/conftool/dbconfig/20210311-084809-root.json
* 08:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
* 08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
* 08:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 30%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14764 and previous config saved to /var/cache/conftool/dbconfig/20210311-083305-root.json
* 08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P14762 and previous config saved to /var/cache/conftool/dbconfig/20210311-082546-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2074', diff saved to https://phabricator.wikimedia.org/P14761 and previous config saved to /var/cache/conftool/dbconfig/20210311-082528-marostegui.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P14760 and previous config saved to /var/cache/conftool/dbconfig/20210311-082445-marostegui.json
* 08:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 10%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14759 and previous config saved to /var/cache/conftool/dbconfig/20210311-081801-root.json
* 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14758 and previous config saved to /var/cache/conftool/dbconfig/20210311-081010-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2148 to s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14757 and previous config saved to /var/cache/conftool/dbconfig/20210311-080944-marostegui.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P14756 and previous config saved to /var/cache/conftool/dbconfig/20210311-074352-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14755 and previous config saved to /var/cache/conftool/dbconfig/20210311-073741-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 60%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14754 and previous config saved to /var/cache/conftool/dbconfig/20210311-072237-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 30%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14753 and previous config saved to /var/cache/conftool/dbconfig/20210311-070734-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 10%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14752 and previous config saved to /var/cache/conftool/dbconfig/20210311-065230-root.json
* 06:48 marostegui: Stop mysql on db2108 to clone db2148 [[phab:T275633|T275633]]
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14750 and previous config saved to /var/cache/conftool/dbconfig/20210311-064821-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P14749 and previous config saved to /var/cache/conftool/dbconfig/20210311-063814-marostegui.json
* 06:36 marostegui: Drop testreduce from m5 - [[phab:T276787|T276787]]
* 05:34 thcipriani: restarted apache2 on gerrit1001
* 00:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2219.codfw.wmnet
* 00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2218.codfw.wmnet
* 00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2217.codfw.wmnet
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2216.codfw.wmnet
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2215.codfw.wmnet


== 2021-03-10 ==
== 2022-09-23 ==
* 23:49 mholloway-shell@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/EventLogging: EventLogging: Stream always in sample if the user is in debugMode ([[phab:T276515|T276515]]) (duration: 01m 23s)
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 23:41 dwisehaupt: disabled silverpop daily run in process-control until utf8mb4 conversion completes on frdev1001
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 23:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: REIMAGE
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 23:10 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: REIMAGE
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 23:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry1002.eqiad.wmnet
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 23:01 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry1002.eqiad.wmnet
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:55 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2001-2002].codfw.wmnet
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:51 andrewbogott: updating puppet compiler facts to catch up with a new custom fact
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:44 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2001-2002].codfw.wmnet
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry1001.eqiad.wmnet
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 22:32 brennen@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.34 (duration: 01m 30s)
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 22:30 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.34
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 22:27 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry1001.eqiad.wmnet
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 22:26 brennen: train status: 1.36.0-wmf.34 ([[phab:T274938|T274938]]): [[phab:T277094|T277094]] believed resolved, promoting to group1.
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 22:25 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: [[gerrit:670535{{!}}Fix client error logging (T277094)]] (duration: 01m 09s)
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 21:53 mutante: ferm/iptables docker NAT rules applied by puppet on releases servers after breaking out fules into their own profile class ([[phab:T276869|T276869]])
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 21:51 dwisehaupt: upgraded mariadb and keeping replication stopped on frdb1002 to start the utf8mb4 table alters under a root screen session
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 21:43 brennen: train status: 1.36.0-wmf.34 ([[phab:T274938|T274938]]): client errors may still be missing for group0; continuing to hold for [[phab:T277094|T277094]] until we know what's broken.
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 21:40 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: [[gerrit:670533{{!}}Revert "Error in shouldLog logic drops most errors" (T277094)]] (duration: 01m 08s)
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 21:38 dwisehaupt: stopping mysql replication on frdev1001 and starting utf8mb4 table alters under a root screen session
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 21:38 dwisehaupt: stopping mysql replication on frdb1003 and starting utf8mb4 table alters under a root screen session
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 21:30 brennen: train status: 1.36.0-wmf.34 ([[phab:T274938|T274938]]): logstash client error board was set up incorrectly; reverting earlier patch for [[phab:T277094|T277094]] and will proceed to group1.
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 21:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cdc47f3e35e815081f787def2d51f3fd337ecf6c}}: jawiki: Growth features: Add help panel links ([[phab:T276830|T276830]]) (duration: 01m 08s)
* 21:16 eileen: civicrm revision changed from {{Gerrit|b13e70d968}} to {{Gerrit|550be50105}}, config revision is {{Gerrit|970b10b0b3}}
* 21:13 cdanis@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 21:00 cdanis@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:57 cdanis@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:56 Urbanecm: Fixing wrong sync message: urbanecm@deploy1002 Synchronized dblists/growthexperiments.dblist {{Gerrit|f72c3d6c4fcbda692c5bf8c37a38667c3ba12d80}}: jawiki: Enable Growth features in stealth mode ([[phab:T276830|T276830]]) (duration: 01m 08s)
* 20:56 Urbanecm: Fixing wrong sync message: urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f72c3d6c4fcbda692c5bf8c37a38667c3ba12d80}}: jawiki: Enable Growth features in stealth mode ([[phab:T276830|T276830]]) (duration: 01m 07s)
* 20:54 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|92ae985df5411de7ff983a778aebde0e10f6253e}}: thwiki: Make Growth features available to newcomers ([[phab:T274646|T274646]]) (duration: 01m 08s)
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|92ae985df5411de7ff983a778aebde0e10f6253e}}: thwiki: Make Growth features available to newcomers ([[phab:T274646|T274646]]) (duration: 01m 07s)
* 20:50 cdanis@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|92ae985df5411de7ff983a778aebde0e10f6253e}}: thwiki: Make Growth features available to newcomers ([[phab:T274646|T274646]]) (duration: 01m 08s)
* 20:41 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: [[gerrit:670529{{!}}Error in shouldLog logic drops most errors (T277094)]] (duration: 01m 14s)
* 20:36 cdanis@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:58 brennen: train status: 1.36.0-wmf.34 ([[phab:T274938|T274938]]): currently blocked at group0 as client error logging is broken (UBN ticket incoming), will hold for patch.
* 19:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a130e9f2eab6dec12aec4380efdfd6bde1767aeb}}: Enable Growth features on eowiki in stealth mode ([[phab:T276123|T276123]]) (duration: 01m 08s)
* 19:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: REIMAGE
* 19:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: REIMAGE
* 19:32 ryankemper: [[phab:T266470|T266470]] `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo enable-puppet "revoking old cert and generating new one with new alt_names - [[phab:T266470|T266470]] - root"'` && `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo run-puppet-agent'`
* 19:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|84271f616081e28e48676a2dd498bd904d5c0b76}}: Enable DiscussionTools beta features on frwiktionary ([[phab:T276189|T276189]]) (duration: 01m 09s)
* 19:28 ryankemper: [[phab:T266470|T266470]] `ryankemper@wdqs1004:~$ sudo enable-puppet "revoking old cert and generating new one with new alt_names - [[phab:T266470|T266470]] - root"` && `sudo run-puppet-agent`
* 19:27 ryankemper: [[phab:T266470|T266470]] `/srv/private` commit SHA for this change is `45852086679616bccb5bba3dd6396082b0f25a3d`
* 19:26 ryankemper: [[phab:T266470|T266470]] `sudo chown -Rv gitpuppet:gitpuppet /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/` && `sudo chown -v  gitpuppet:gitpuppet /srv/private/modules/secret/secrets/ssl/wdqs.discovery.wmnet.key`
* 19:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5093618d5069dd287a4f33c1d49b5e5c8a05a13c}}: Enable DiscussionTools beta feature for newtopictool on most wikis ([[phab:T275827|T275827]]) (duration: 01m 08s)
* 19:23 ryankemper: [[phab:T266470|T266470]] Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/670562 (copies over new pubkey)
* 19:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4824679d79d462459eba6b77a5af787817f186d2}}: Disable DiscussionTools Reply Tool A/B test ([[phab:T276967|T276967]]) (duration: 01m 07s)
* 19:22 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/DiscussionTools/includes/Hooks/HookUtils.php: {{Gerrit|9cb48f08f452a124868e1bf9d700a45c1d7255f4}}: Allow users to continue using reply tool after disabling A/B test ([[phab:T276967|T276967]]) (duration: 01m 07s)
* 19:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/DiscussionTools/includes/Hooks/HookUtils.php: {{Gerrit|4193ff71df421f2fe2ed3e1f2fa1c54334e722e2}}: Allow users to continue using reply tool after disabling A/B test ([[phab:T276967|T276967]]) (duration: 01m 09s)
* 19:18 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: {{Gerrit|e998086f7cf7839d2c9aa917776509b3198c3142}}: searchSatisfaction: Allow for async initialisation ([[phab:T274869|T274869]]) (duration: 01m 08s)
* 19:18 ryankemper: [[phab:T266470|T266470]] `sudo cergen -c 'wdqs.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d`
* 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: REIMAGE
* 19:16 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: {{Gerrit|d9bad12cdb02e13517cecd1775162fde88af48eb}}: searchSatisfaction: Allow for async initialisation ([[phab:T274869|T274869]]) (duration: 01m 08s)
* 19:16 ryankemper: [[phab:T266470|T266470]] `sudo rm -fv certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12 certificates/wdqs.discovery.wmnet/truststore.jks` (full paths not provided to fit the IRC line)
* 19:15 ryankemper: [[phab:T266470|T266470]] `sudo puppet cert clean wdqs.discovery.wmnet`
* 19:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: REIMAGE
* 19:14 ryankemper: [[phab:T266470|T266470]] on `ryankemper@cumin1001`: `sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - [[phab:T266470|T266470]]"'`
* 19:14 ryankemper: [[phab:T266470|T266470]] Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation
* 19:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fe99c312b3ce635342cbd690c34e2610184b74b0}}: Remove unused config for InukaPageView ([[phab:T265921|T265921]]) (duration: 01m 26s)
* 18:56 dwisehaupt: all fundraising servers are now running buster - [[phab:T254198|T254198]]
* 18:37 mforns@deploy1002: Finished deploy [analytics/refinery@7fbc3c7] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656] (duration: 04m 12s)
* 18:33 mforns@deploy1002: Started deploy [analytics/refinery@7fbc3c7] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656]
* 18:33 mforns@deploy1002: Finished deploy [analytics/refinery@7fbc3c7] (thin): Regular analytics weekly train THIN [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656] (duration: 00m 07s)
* 18:33 mforns@deploy1002: Started deploy [analytics/refinery@7fbc3c7] (thin): Regular analytics weekly train THIN [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656]
* 18:32 mforns@deploy1002: Finished deploy [analytics/refinery@7fbc3c7]: Regular analytics weekly train [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656] (duration: 14m 30s)
* 18:18 mforns@deploy1002: Started deploy [analytics/refinery@7fbc3c7]: Regular analytics weekly train [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656]
* 17:48 mutante: new Wikimedia project language "trv" added - Seediq is an Atayalic language spoken in the mountains of Northern Taiwan by the Seediq and Taroko people.
* 17:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: REIMAGE
* 17:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: REIMAGE
* 17:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: REIMAGE
* 17:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: REIMAGE
* 16:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1030.eqiad.wmnet
* 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: REIMAGE
* 16:50 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1030.eqiad.wmnet
* 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: REIMAGE
* 16:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
* 16:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
* 16:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
* 16:18 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14744 and previous config saved to /var/cache/conftool/dbconfig/20210310-153324-root.json
* 15:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sodium.wikimedia.org
* 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14743 and previous config saved to /var/cache/conftool/dbconfig/20210310-151820-root.json
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sodium.wikimedia.org
* 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 30%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14742 and previous config saved to /var/cache/conftool/dbconfig/20210310-150316-root.json
* 14:53 klausman@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: cluster=ml_serve,service=kubemaster
* 14:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14741 and previous config saved to /var/cache/conftool/dbconfig/20210310-144813-root.json
* 14:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 14:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P14740 and previous config saved to /var/cache/conftool/dbconfig/20210310-143547-marostegui.json
* 14:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14739 and previous config saved to /var/cache/conftool/dbconfig/20210310-142316-root.json
* 14:19 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:19 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:19 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 14:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14738 and previous config saved to /var/cache/conftool/dbconfig/20210310-140812-root.json
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 14:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
* 13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
* 13:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14736 and previous config saved to /var/cache/conftool/dbconfig/20210310-135309-root.json
* 13:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 13:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 13:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 13:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 13:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 13:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 13:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
* 13:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1029.eqiad.wmnet
* 13:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 12:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 12:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 12:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 12:52 ariel@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 12:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 12:47 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1029.eqiad.wmnet
* 12:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|623ed48472e63c8f1c5965289163d7ef80ab4412}}: nowiki: Enable Growth features in stealth mode ([[phab:T276816|T276816]]) (duration: 01m 07s)
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P14734 and previous config saved to /var/cache/conftool/dbconfig/20210310-124140-marostegui.json
* 12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14733 and previous config saved to /var/cache/conftool/dbconfig/20210310-123654-root.json
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 12:34 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.34/languages: [[gerrit:670350{{!}}Add shy name (same as shy-latn)]] ([[phab:T259360|T259360]]) (duration: 01m 10s)
* 12:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 12:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 12:32 ariel@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 12:31 ariel@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 12:31 ariel@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 12:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 12:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 12:22 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.33/languages: [[gerrit:670351{{!}}Add shy name (same as shy-latn)]] ([[phab:T259360|T259360]]) (duration: 01m 10s)
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14732 and previous config saved to /var/cache/conftool/dbconfig/20210310-122150-root.json
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 12:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 12:12 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:612918{{!}}Update several Wikidata-related configs]] (duration: 01m 32s)
* 12:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 12:07 klausman@cumin1001: START - Cookbook sre.dns.netbox
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14731 and previous config saved to /var/cache/conftool/dbconfig/20210310-120647-root.json
* 11:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1024.eqiad.wmnet
* 11:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 11:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 11:34 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1013.eqiad.wmnet
* 11:29 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1013.eqiad.wmnet
* 11:27 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1024.eqiad.wmnet
* 11:25 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14730 and previous config saved to /var/cache/conftool/dbconfig/20210310-112553-kormat.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P14729 and previous config saved to /var/cache/conftool/dbconfig/20210310-112427-marostegui.json
* 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14728 and previous config saved to /var/cache/conftool/dbconfig/20210310-111903-root.json
* 11:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 11:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
* 11:10 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14727 and previous config saved to /var/cache/conftool/dbconfig/20210310-111049-kormat.json
* 11:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
* 11:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1028.eqiad.wmnet
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14726 and previous config saved to /var/cache/conftool/dbconfig/20210310-110359-root.json
* 11:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1028.eqiad.wmnet
* 10:55 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14725 and previous config saved to /var/cache/conftool/dbconfig/20210310-105545-kormat.json
* 10:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1023.eqiad.wmnet
* 10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14724 and previous config saved to /var/cache/conftool/dbconfig/20210310-104856-root.json
* 10:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
* 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
* 10:40 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14723 and previous config saved to /var/cache/conftool/dbconfig/20210310-104042-kormat.json
* 10:40 effie: upgrade memcached on mc2019, mc1019
* 10:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
* 10:38 kormat@cumin1001: dbctl commit (dc=all): 'db1168 depooling: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14722 and previous config saved to /var/cache/conftool/dbconfig/20210310-103836-kormat.json
* 10:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: schema change [[phab:T267767|T267767]]
* 10:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: schema change [[phab:T267767|T267767]]
* 10:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
* 10:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
* 10:29 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1023.eqiad.wmnet
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P14721 and previous config saved to /var/cache/conftool/dbconfig/20210310-101922-marostegui.json
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
* 10:12 marostegui: Drop testreduce_vd from m5 master - [[phab:T276787|T276787]]
* 10:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
* 10:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
* 09:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
* 09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
* 09:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
* 09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2031.codfw.wmnet
* 09:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2031.codfw.wmnet
* 09:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
* 09:25 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: REIMAGE
* 09:23 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: REIMAGE
* 09:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
* 09:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
* 08:39 marostegui: Upgrade mysql and kernel on db2132
* 08:25 marostegui: Upgrade mysql and kernel on db2078
* 08:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thorium.eqiad.wmnet
* 08:20 moritzm: pruning obsolete kernels from ganeti hosts in eqiad/codfw
* 08:17 moritzm: powercycling thorium, stuck on reboot
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14719 and previous config saved to /var/cache/conftool/dbconfig/20210310-081627-root.json
* 08:11 marostegui: Check tables on db1150:3315 - [[phab:T276742|T276742]]
* 08:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host thorium.eqiad.wmnet
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics-tool1001.eqiad.wmnet
* 08:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host analytics-tool1001.eqiad.wmnet
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14718 and previous config saved to /var/cache/conftool/dbconfig/20210310-080123-root.json
* 07:52 marostegui: Deploy schema change on s7 codfw (lag will appear) [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14717 and previous config saved to /var/cache/conftool/dbconfig/20210310-074618-root.json
* 07:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
* 07:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P14716 and previous config saved to /var/cache/conftool/dbconfig/20210310-072642-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P14715 and previous config saved to /var/cache/conftool/dbconfig/20210310-072508-marostegui.json
* 07:07 elukey: sudo apt-get remove linux-image-4.9.0-9-amd64 on sodium to free space for /boot
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2145', diff saved to https://phabricator.wikimedia.org/P14714 and previous config saved to /var/cache/conftool/dbconfig/20210310-070642-marostegui.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P14713 and previous config saved to /var/cache/conftool/dbconfig/20210310-070312-marostegui.json
* 07:01 elukey: remove the oldest kernel on ganeti nodes to free space for /boot
* 07:00 marostegui: Depool clouddb1016
* 06:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1111.eqiad.wmnet with reason: REIMAGE
* 06:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1111.eqiad.wmnet with reason: REIMAGE
* 06:17 elukey: reimage an-worker1111 to buster
* 05:27 ryankemper: [[phab:T266470|T266470]] Rollout of updated certificate complete. We're now ready to implement envoy for `wdqs-test` which will allow `wdqs1009` to be reachable via port 443 and thereby allow us to go live with `query-preview.wikidata.org` when the time comes
* 05:26 ryankemper: [[phab:T266470|T266470]] `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo enable-puppet "revoking old cert and generating new one with new alt_names - [[phab:T266470|T266470]] - root"'` and `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo run-puppet-agent'`
* 05:24 ryankemper: [[phab:T266470|T266470]] Test queries passing on `wdqs1004`,  and `https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wdqs&from=now-1h&to=now` looks as expected. Proceeding to rest of fleet
* 05:20 ryankemper: [[phab:T266470|T266470]] Enabled puppet on single public wdqs host to verify certificate update is without issue: `ryankemper@wdqs1004:~$ sudo enable-puppet "revoking old cert and generating new one with new alt_names - [[phab:T266470|T266470]] - root"` followed by `ryankemper@wdqs1004:~$ sudo run-puppet-agent`
* 05:15 ryankemper: [[phab:T266470|T266470]] [`/srv/private`] All changes commited to private git repo, commit SHA `ec1d6cfae8c72e4f807b343cdb9f25c27817d98d`
* 05:13 ryankemper: [[phab:T266470|T266470]] [`/srv/private`] `chown gitpuppet:gitpuppet` on all modified files (were owned by root, probably because I sudo'd - may be that a git commit hook would have caught that but explicitly chowning just to be safe)
* 05:06 ryankemper: [[phab:T266470|T266470]] New `wdqs.discovery.wmnet.crt` added to public `operations/puppet` repo: https://gerrit.wikimedia.org/r/c/operations/puppet/+/670337/
* 04:58 ryankemper: [[phab:T266470|T266470]] The above two actions mean that we're ready to generate the new certificate files. Proceeding: `sudo cergen -c 'wdqs.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d` on `ryankemper@puppetmaster1001:/srv/private`
* 04:57 ryankemper: [[phab:T266470|T266470]] `sudo rm -fv certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12 certificates/wdqs.discovery.wmnet/truststore.jks` (full paths not provided to fit the IRC line)
* 04:56 ryankemper: [[phab:T266470|T266470]] In the `/srv/private` repo, `/srv/private/modules/secret/secrets/certificates/certificate.manifests.d/wdqs.certs.yaml` has been edited to add the relevant `alt_names`
* 04:55 ryankemper: [[phab:T266470|T266470]] Certificate revoked: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean wdqs.discovery.wmnet`
* 04:53 ryankemper: [[phab:T266470|T266470]] `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - [[phab:T266470|T266470]]"'`
* 04:52 ryankemper: [[phab:T266470|T266470]] Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation
* 01:08 krinkle@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/NavigationTiming/modules/ext.navigationTiming.js: [[phab:T276826|T276826]] {{Gerrit|Ibd9ddf14d64}} (duration: 01m 14s)
* 00:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE
* 00:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE


== 2021-03-09 ==
== 2022-09-22 ==
* 23:59 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 22:04 mutante: phab1001 - manually running phab public task dumd script after making changes to redirect stdout
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:42 elukey: reimaged an-worker1091 to buster
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 bstorm: depooled labsdb1009 [[phab:T276980|T276980]]
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 bstorm: downtimed labsdb1009 so it doesn't keep paging [[phab:T276980|T276980]]
* 21:23 dancy@deploy1002: backport aborted:  (duration: 00m 05s)
* 20:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:09 brennen: train status: 1.36.0-wmf.32 ([[phab:T274938|T274938]]) on group0 at 20:06:32 UTC; logs initially quiet.
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:05 brennen@deploy1002: Pruned MediaWiki: 1.36.0-wmf.31 (duration: 03m 34s)
* 20:55 brennen: end of utc late backport & config window
* 19:04 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 18:54 brennen@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.34 (duration: 47m 25s)
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 18:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 18:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:47 dcausse: re-pool wdqs1004
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 18:37 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:36 brennen@deploy1002: backport aborted: (duration: 02m 16s)
* 18:35 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:34 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:26 elukey: reimage an-worker1087 to buster
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:13 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:12 brennen@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.34
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:10 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 18:03 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:02 marxarelli: deleting shut down memc* deployment-prep instances to free up quota for replacement db instances ([[phab:T276968|T276968]])
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 18:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:50 papaul: rebooting db2073 for firmware upgrade
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3119d7a703a38b328fa634db64b2929d54829884}}: sqwiki: Fix deployment of Growth features (duration: 01m 00s)
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:38 dancy@deploy1002: Started scap: testing
* 16:41 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 16:40 elukey: reimage analytics1077 to buster
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 16:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1027.eqiad.wmnet
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 16:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 16:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 16:31 brennen: 1.36.0-wmf.34 was branched at {{Gerrit|e175899921535f83e168145cbe942489475607db}} for [[phab:T274938|T274938]]
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 16:27 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1027.eqiad.wmnet
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14708 and previous config saved to /var/cache/conftool/dbconfig/20210309-162116-root.json
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 80%: 10', diff saved to https://phabricator.wikimedia.org/P14707 and previous config saved to /var/cache/conftool/dbconfig/20210309-160613-root.json
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:56 moritzm: imported prometheus-ircd-exporter 0.2 to apt.wikimedia.org [[phab:T224579|T224579]]
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14706 and previous config saved to /var/cache/conftool/dbconfig/20210309-155109-root.json
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14705 and previous config saved to /var/cache/conftool/dbconfig/20210309-153715-root.json
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 40%: 10', diff saved to https://phabricator.wikimedia.org/P14704 and previous config saved to /var/cache/conftool/dbconfig/20210309-153605-root.json
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1008.eqiad.wmnet
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1008.eqiad.wmnet
* 16:39 dancy@deploy1002: Sync cancelled.
* 15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1007.eqiad.wmnet
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:28 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare KaiOS / Inuka event streams - [[phab:T267344|T267344]] [[phab:T267345|T267345]] [[phab:T267346|T267346]] (duration: 00m 58s)
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 60%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14703 and previous config saved to /var/cache/conftool/dbconfig/20210309-152212-root.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14702 and previous config saved to /var/cache/conftool/dbconfig/20210309-152102-root.json
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:20 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Bump session_tick sampling rate to 10% (duration: 00m 58s)
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:18 elukey: reimage analytics1072 (hadoop hdfs journal node) to buster
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1007.eqiad.wmnet
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1006.eqiad.wmnet
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1006.eqiad.wmnet
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 30%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14701 and previous config saved to /var/cache/conftool/dbconfig/20210309-150708-root.json
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 20%: 10', diff saved to https://phabricator.wikimedia.org/P14700 and previous config saved to /var/cache/conftool/dbconfig/20210309-150558-root.json
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 15:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1005.eqiad.wmnet
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1005.eqiad.wmnet
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1089.eqiad.wmnet with reason: REIMAGE
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1090.eqiad.wmnet with reason: REIMAGE
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 14:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1089.eqiad.wmnet with reason: REIMAGE
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14699 and previous config saved to /var/cache/conftool/dbconfig/20210309-145205-root.json
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1090.eqiad.wmnet with reason: REIMAGE
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2008.codfw.wmnet
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2008.codfw.wmnet
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P14698 and previous config saved to /var/cache/conftool/dbconfig/20210309-143453-marostegui.json
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:32 volker-e@deploy1002: Finished deploy [design/style-guide@deee49c]: Deploy design/style-guide: {{Gerrit|deee49c}} index: Add links to our design process and work guides (#446) (duration: 00m 06s)
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:32 volker-e@deploy1002: Started deploy [design/style-guide@deee49c]: Deploy design/style-guide: {{Gerrit|deee49c}} index: Add links to our design process and work guides (#446)
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 14:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2007.codfw.wmnet
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14697 and previous config saved to /var/cache/conftool/dbconfig/20210309-143033-root.json
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:29 elukey: drain + reimage an-worker1090/89 to Buster
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 14:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2007.codfw.wmnet
* 14:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
* 14:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 14:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2006.codfw.wmnet
* 14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2006.codfw.wmnet
* 14:17 jakob@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14696 and previous config saved to /var/cache/conftool/dbconfig/20210309-141529-root.json
* 14:14 jakob@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2005.codfw.wmnet
* 14:12 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 14:10 moritzm: installing intel-microcode updates on stretch
* 14:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2005.codfw.wmnet
* 14:08 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 14:07 jgleeson: updated smashpig from {{Gerrit|5a69abd40f}} to {{Gerrit|58b070db1a}}
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14694 and previous config saved to /var/cache/conftool/dbconfig/20210309-140025-root.json
* 13:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1004.eqiad.wmnet
* 13:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE
* 13:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE
* 13:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE
* 13:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14693 and previous config saved to /var/cache/conftool/dbconfig/20210309-134522-root.json
* 13:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1004.eqiad.wmnet
* 13:34 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: HW issue
* 13:34 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: HW issue
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14692 and previous config saved to /var/cache/conftool/dbconfig/20210309-133124-root.json
* 13:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1003.eqiad.wmnet
* 13:27 elukey: reimage an-worker1102 and an-worker1080 (hdfs journal node) to Buster
* 13:21 jgleeson: updated payments-wiki from {{Gerrit|65dbf0ed9d}} to {{Gerrit|0e7800027a}}
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P14691 and previous config saved to /var/cache/conftool/dbconfig/20210309-131652-marostegui.json
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14690 and previous config saved to /var/cache/conftool/dbconfig/20210309-131620-root.json
* 13:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1003.eqiad.wmnet
* 13:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1103.eqiad.wmnet with reason: REIMAGE
* 13:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1103.eqiad.wmnet with reason: REIMAGE
* 13:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
* 13:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14689 and previous config saved to /var/cache/conftool/dbconfig/20210309-130116-root.json
* 12:59 elukey: drain + reimage an-worker1103 to Buster
* 12:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
* 12:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
* 12:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1403.eqiad.wmnet
* 12:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1402.eqiad.wmnet
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P14688 and previous config saved to /var/cache/conftool/dbconfig/20210309-125007-marostegui.json
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14687 and previous config saved to /var/cache/conftool/dbconfig/20210309-124931-root.json
* 12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mw1403.eqiad.wmnet
* 12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mw1402.eqiad.wmnet
* 12:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14686 and previous config saved to /var/cache/conftool/dbconfig/20210309-123427-root.json
* 12:33 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1038.eqiad.wmnet
* 12:31 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
* 12:30 hnowlan: regenerating interfaces and reimaging aqs101[1-5]
* 12:29 marostegui: Upgrade db2084 kernel
* 12:26 marostegui: Upgrade db2094 kernel
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14685 and previous config saved to /var/cache/conftool/dbconfig/20210309-121924-root.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1166 entirely', diff saved to https://phabricator.wikimedia.org/P14684 and previous config saved to /var/cache/conftool/dbconfig/20210309-121913-marostegui.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14683 and previous config saved to /var/cache/conftool/dbconfig/20210309-121849-root.json
* 12:16 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/: {{Gerrit|dbd6f0cb299bcfb6648b351e1476100fe669cc58}}: Make help panel fallback to help desk if no mentor is available ([[phab:T275908|T275908]]; [[phab:T273782|T273782]]) (duration: 01m 01s)
* 12:13 marostegui: Upgrade db2080 kernel
* 12:06 marostegui: Upgrade db2077 kernel
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 for schema change', diff saved to https://phabricator.wikimedia.org/P14682 and previous config saved to /var/cache/conftool/dbconfig/20210309-120326-marostegui.json
* 12:00 marostegui: Upgrade db2076 kernel
* 11:56 effie: restart envoy on mw1276
* 11:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
* 11:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
* 11:52 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1307.eqiad.wmnet
* 11:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2004.codfw.wmnet
* 11:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
* 11:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mw1307.eqiad.wmnet
* 11:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
* 11:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:29 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1001.eqiad.wmnet
* 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc1001.eqiad.wmnet
* 11:25 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2004.codfw.wmnet
* 11:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1001.eqiad.wmnet
* 11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf1001.eqiad.wmnet
* 11:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1002.eqiad.wmnet
* 11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf1002.eqiad.wmnet
* 11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2002.codfw.wmnet
* 11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf2002.codfw.wmnet
* 11:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2001.codfw.wmnet
* 11:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf2001.codfw.wmnet
* 11:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1037.eqiad.wmnet
* 10:56 moritzm: installing mariadb-10.1 updates for stretch (distro version with libs/tools only, not wmf-mariadb)
* 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1037.eqiad.wmnet
* 10:53 dcausse: started to import lexemes on wdqs1009 ([[phab:T276784|T276784]])
* 10:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2003.codfw.wmnet
* 10:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:45 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
* 10:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2020-2027].codfw.wmnet
* 10:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2003.codfw.wmnet
* 10:31 moritzm: upgrading perf on stretch hosts
* 10:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
* 10:23 moritzm: installing gdisk security updates
* 10:15 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
* 10:14 moritzm: installing libbsd security updates
* 10:07 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[2020-2027].codfw.wmnet
* 10:00 moritzm: installing busybox security updates
* 09:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1001.eqiad.wmnet
* 09:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetboard1001.eqiad.wmnet
* 09:50 marostegui: Reboot db2073 for kernel upgrade (stretch)
* 09:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2001.codfw.wmnet
* 09:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetboard2001.codfw.wmnet
* 09:44 marostegui: Reboot db2072 for kernel upgrade (stretch)
* 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1112.eqiad.wmnet with reason: REIMAGE
* 09:40 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1112.eqiad.wmnet with reason: REIMAGE
* 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1076.eqiad.wmnet with reason: REIMAGE
* 09:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1076.eqiad.wmnet with reason: REIMAGE
* 09:14 elukey: drain + reimage analytics1076 and an-worker1112 to Buster
* 09:01 moritzm: installing Linux 4.9.258 updates on Stretch hosts
* 08:59 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2017-2019].codfw.wmnet
* 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1093.eqiad.wmnet with reason: REIMAGE
* 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1092.eqiad.wmnet with reason: REIMAGE
* 08:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1093.eqiad.wmnet with reason: REIMAGE
* 08:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1092.eqiad.wmnet with reason: REIMAGE
* 08:46 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[2017-2019].codfw.wmnet
* 08:46 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:46 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be2016.codfw.wmnet
* 08:12 marostegui: Stop mysql on clouddb1015:3314, 3316
* 07:59 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be2016.codfw.wmnet
* 07:50 dcausse: restarted blazegraph on wdqs1004 and depooled it to catchup lag
* 07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1095.eqiad.wmnet with reason: REIMAGE
* 07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1094.eqiad.wmnet with reason: REIMAGE
* 07:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1095.eqiad.wmnet with reason: REIMAGE
* 07:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1094.eqiad.wmnet with reason: REIMAGE
* 07:24 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 07:03 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:01 elukey: drain + reimage an-worker109[4,5] to Buster
* 06:58 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 06:30 _joe_: restarting gerrit on gerrit1001, using 48 GB of heap
* 06:19 marostegui: Deploy schema change on s6 codfw (there will be lag on codfw) [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 05:37 marostegui: Stop mysql on clouddb1014:3312, 3317 to transfer its data to cloudb1021
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for table check [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14675 and previous config saved to /var/cache/conftool/dbconfig/20210309-051646-marostegui.json
* 00:58 Krinkle: krinkle@mwmaint1002 Ran invalidateUserSesssions.php for one user
* 00:13 urbanecm@deploy1002: Synchronized wmf-config/config/incubatorwiki.yaml: {{Gerrit|0d260eda2d62ae053310ee978201b1a031522d59}}: Enable modern Vector on incubator ([[phab:T275479|T275479]]; 2/2) (duration: 00m 57s)
* 00:11 urbanecm@deploy1002: Synchronized dblists/desktop-improvements.dblist: {{Gerrit|0d260eda2d62ae053310ee978201b1a031522d59}}: Enable modern Vector on incubator ([[phab:T275479|T275479]]; 1/2) (duration: 01m 01s)
* 00:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce82e0cd6015b362f9acc8e90d300cf88738cf98}}: Logo updates ([[phab:T273085|T273085]]) (duration: 00m 58s)
* 00:08 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|ce82e0cd6015b362f9acc8e90d300cf88738cf98}}: Logo updates ([[phab:T273085|T273085]]) (duration: 00m 58s)


== 2021-03-08 ==
== 2022-09-21 ==
* 22:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1005.eqiad.wmnet with reason: REIMAGE
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:34 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1005.eqiad.wmnet with reason: REIMAGE
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:42 mholloway-shell@deploy1002: Synchronized wmf-config/CommonSettings.php: WikimediaEvents: Create data QA group/right on testwiki ([[phab:T276515|T276515]]) (duration: 00m 57s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:18 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate Editing schemas to Event Platform on all wikis - [[phab:T267343|T267343]], [[phab:T267353|T267353]] (duration: 00m 58s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:04 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate Editing schemas to Event Platform on testwiki, take 2 - [[phab:T267343|T267343]], [[phab:T267353|T267353]] (duration: 00m 58s)
* 20:46 tgr_: UTC late deploys done
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1227e2ad8d14e5f0a10a1050e0fadbe0d3c3e238}}: idwiki: Growth features: Add mentorlist ([[phab:T259024|T259024]]) (duration: 00m 58s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 legoktm: legoktm@registry1004:~$ sudo systemctl reset-failed # to fix icinga warning
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 20:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: REIMAGE
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: REIMAGE
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ce7b4602d2b109adfb86bef6795a4d07a1208b9}}: Set wgGEHelpPanelAskMentor to true by default ([[phab:T275908|T275908]]) (duration: 01m 07s)
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 bblack: miscweb[12]002 - re-enabled puppet and deployed new cert
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 20:23 bblack: miscweb[12]002 - disabling puppet to remake cergen cert...
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate Editing schemas to Event Platform on testwiki - [[phab:T267343|T267343]], [[phab:T267353|T267353]] (duration: 00m 57s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:47 dduvall@deploy1002: Synchronized php-1.36.0-wmf.33/maintenance/: maintenance: {{Gerrit|aa6f291}}: {{Gerrit|4893ddb}}: {{Gerrit|fa97162}}: {{Gerrit|380c448}}: DB_NONE offline maintenance improvements (duration: 00m 58s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:37 dduvall@deploy1002: Synchronized wmf-config/: wmf-config/env.php,CommonSettings.php: {{Gerrit|f70049b}}: {{Gerrit|e53dc3a}}: {{Gerrit|f9b9ea1}}: WMF_DATACENTER, WMF_MAINTENANCE_OFFLINE handling (duration: 01m 00s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:37 bblack: cp-text: banning varnish-fe for req.http.host == ( 7 wikis from [[phab:T274784|T274784]] )
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 19:21 urbanecm@deploy1002: Synchronized wmf-config/config/: {{Gerrit|1c46d0b}}: {{Gerrit|1aad60b}}: vector: Expand Desktop Improvements pilot wiki group ([[phab:T273090|T273090]]) (duration: 00m 58s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:20 urbanecm@deploy1002: Synchronized dblists/desktop-improvements.dblist: {{Gerrit|1c46d0b}}: {{Gerrit|1aad60b}}: vector: Expand Desktop Improvements pilot wiki group ([[phab:T273090|T273090]]) (duration: 00m 57s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:14 bblack: cp-text: disabling puppet ahead of [[phab:T274784|T274784]] changes - https://gerrit.wikimedia.org/r/c/operations/puppet/+/669840
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e1cb98890fd4ad0ed25670de2fff6db6e59d7132}}: Enable flood flag on hrwiki ([[phab:T276560|T276560]]) (duration: 00m 58s)
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a85580030027ca5b879688ed5d76123454164001}}: Fix sqwiki help panel links description ([[phab:T275550|T275550]]) (duration: 00m 58s)
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 18:47 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dfd95883ed15c532e6345d1dfacfc274b87fcd80}}: hiwiki: Add missing help panel link descriptions ([[phab:T276450|T276450]]) (duration: 00m 58s)
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 18:37 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1116.eqiad.wmnet with reason: REIMAGE
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1116.eqiad.wmnet with reason: REIMAGE
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 18:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1115.eqiad.wmnet with reason: REIMAGE
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1115.eqiad.wmnet with reason: REIMAGE
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:29 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 18:11 elukey: drain + reimage an-worker11[15,16] to Buster
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 17:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1114.eqiad.wmnet with reason: REIMAGE
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 17:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1113.eqiad.wmnet with reason: REIMAGE
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 17:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1114.eqiad.wmnet with reason: REIMAGE
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 17:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1113.eqiad.wmnet with reason: REIMAGE
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:12 elukey: drain + reimage an-worker11[13,14] to Buster
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1110.eqiad.wmnet with reason: REIMAGE
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1109.eqiad.wmnet with reason: REIMAGE
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1110.eqiad.wmnet with reason: REIMAGE
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 16:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1109.eqiad.wmnet with reason: REIMAGE
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:17 elukey: drain + reimage an-worker1109/1110 to Buster
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:55 marostegui: Restart db1115 (tendril host)
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14669 and previous config saved to /var/cache/conftool/dbconfig/20210308-154710-root.json
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14666 and previous config saved to /var/cache/conftool/dbconfig/20210308-153207-root.json
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 15:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1108.eqiad.wmnet with reason: REIMAGE
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14665 and previous config saved to /var/cache/conftool/dbconfig/20210308-151703-root.json
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1108.eqiad.wmnet with reason: REIMAGE
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1107.eqiad.wmnet with reason: REIMAGE
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 15:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1107.eqiad.wmnet with reason: REIMAGE
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 15:07 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate to EventGate on all wikis - [[phab:T267348|T267348]] (duration: 00m 59s)
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:02 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove wgEventLoggingSchemas overrides for Growth and WMDE Tech wishes schemas - [[phab:T267333|T267333]], etc. (duration: 00m 59s)
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14664 and previous config saved to /var/cache/conftool/dbconfig/20210308-150159-root.json
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:54 elukey: drain + reimage an-worker110[7,8] to Buster
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:51 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 14:51 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:48 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 14:48 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1106.eqiad.wmnet with reason: REIMAGE
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1105.eqiad.wmnet with reason: REIMAGE
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1106.eqiad.wmnet with reason: REIMAGE
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 14:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1105.eqiad.wmnet with reason: REIMAGE
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:51 elukey: drain + reimage an-worker110[4,5] to Buster
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14663 and previous config saved to /var/cache/conftool/dbconfig/20210308-130712-root.json
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14662 and previous config saved to /var/cache/conftool/dbconfig/20210308-125208-root.json
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14661 and previous config saved to /var/cache/conftool/dbconfig/20210308-123704-root.json
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14660 and previous config saved to /var/cache/conftool/dbconfig/20210308-122201-root.json
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 12:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/includes/Mentorship/MentorHooks.php: {{Gerrit|48d6c55c91b42445900ccdf06b78703c1c5233a6}}: MentorHooks: Make mentor assignment follow same rules as HomepageHooks ([[phab:T276720|T276720]]) (duration: 00m 58s)
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1088.eqiad.wmnet with reason: REIMAGE
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1088.eqiad.wmnet with reason: REIMAGE
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:41 elukey: drain + reimage an-worker1104/1089 to Debian Buster
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1084.eqiad.wmnet with reason: REIMAGE
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1083.eqiad.wmnet with reason: REIMAGE
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1084.eqiad.wmnet with reason: REIMAGE
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1083.eqiad.wmnet with reason: REIMAGE
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:01 marostegui: Repool clouddb1013:3311, clouddb1013:3313
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:55 _joe_: uploading new versions of docker images: php7.<nowiki>{</nowiki>2,3<nowiki>}</nowiki>-<nowiki>{</nowiki>cli,fpm<nowiki>}</nowiki>, httpd, httpd-fcgi, mediawiki-httpd, memcached [[phab:T276097|T276097]] [[phab:T265327|T265327]]
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:34 _joe_: manually removed the old graphoid IP from scb server's interfaces (long-standing bug in wikimedia-lvs-realserver when removing the last managed IP)
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:19 elukey: drain + reimage an-worker108[3,4] to Buster
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 09:17 _joe_: regenerating puppet certs for scb200<nowiki>{</nowiki>1,2<nowiki>}</nowiki>
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1082.eqiad.wmnet with reason: REIMAGE
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: REIMAGE
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1082.eqiad.wmnet with reason: REIMAGE
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: REIMAGE
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:21 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 08:20 elukey: drain + reimage an-worker108[1,2] to Buster
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1075.eqiad.wmnet with reason: REIMAGE
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 07:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1075.eqiad.wmnet with reason: REIMAGE
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1074.eqiad.wmnet with reason: REIMAGE
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1074.eqiad.wmnet with reason: REIMAGE
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:32 marostegui: Depool clouddb1013:3311, clouddb1013:3313 - [[phab:T269211|T269211]]
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 07:23 elukey: drain + reimage analytics107[4,5] to Buster
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14657 and previous config saved to /var/cache/conftool/dbconfig/20210308-071443-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14656 and previous config saved to /var/cache/conftool/dbconfig/20210308-065939-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2116 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14655 and previous config saved to /var/cache/conftool/dbconfig/20210308-065300-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2092 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14654 and previous config saved to /var/cache/conftool/dbconfig/20210308-065220-marostegui.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2146 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14653 and previous config saved to /var/cache/conftool/dbconfig/20210308-064953-marostegui.json
* 06:44 marostegui: Set innodb_change_buffering = none on all parsercache hosts [[phab:T263443|T263443]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14652 and previous config saved to /var/cache/conftool/dbconfig/20210308-064436-root.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14651 and previous config saved to /var/cache/conftool/dbconfig/20210308-063700-marostegui.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14650 and previous config saved to /var/cache/conftool/dbconfig/20210308-062932-root.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14649 and previous config saved to /var/cache/conftool/dbconfig/20210308-062350-marostegui.json
* 06:21 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)


== 2021-03-07 ==
== 2022-09-20 ==
* 08:01 elukey: "megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll" on analytics1066 - BBU looks fine, but the raid controller was using WriteThrough
* 20:19 cjming: end of UTC late backport window
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 18:50 jynus: restart db2100:s7 to apply new config
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 16:42 dancy@deploy1002: Sync cancelled.
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 16:09 awight@deploy1002: backport aborted:  (duration: 00m 33s)
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-03-05 ==
== 2022-09-19 ==
* 23:16 legoktm: imported pygments 2.8.0+dfsg-1 to apt.wm.o buster-wikimedia component/pygments ([[phab:T276298|T276298]])
* 22:59 ebernhardson: [[phab:T317200|T317200]] start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
* 21:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:21 maryum: Deployed security patch for [[phab:T302479|T302479]]
* 21:32 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
* 21:01 legoktm: updated udplog to 1.9 on mwlog1002.eqiad.wmnet and mwlog2002.codfw.wmnet
* 21:15 sbassett: Deployed security patch for [[phab:T312820|T312820]]
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deploy1001.eqiad.wmnet
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts deploy1001.eqiad.wmnet
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2002.codfw.wmnet
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2001.codfw.wmnet
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry2004.codfw.wmnet
* 20:59 cjming: end of UTC late backport window
* 20:04 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry2004.codfw.wmnet
* 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031{{!}}Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s)
* 20:04 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2004.codfw.wmnet
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:02 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2004.codfw.wmnet
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:30 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2004.codfw.wmnet
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:14 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2004.codfw.wmnet
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 mutante: phab1001 - running public_task_dump.py (from cron job) manually
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:50 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry2004.eqiad.wmnet
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:45 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry2004.eqiad.wmnet
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:45 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1021.eqiad.wmnet with reason: REIMAGE
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:43 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1021.eqiad.wmnet with reason: REIMAGE
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:23 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459{{!}}Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s)
* 18:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 16:58 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:54 effie: depool mw1276 and pool back
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:53 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:48 razzi: edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021
* 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877{{!}}ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s)
* 16:36 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1036.eqiad.wmnet
* 20:12 cjming@deploy1002: Finished scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s)
* 16:30 razzi: delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:28 razzi: rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:22 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1036.eqiad.wmnet
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:17 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1012.eqiad.wmnet
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1086.eqiad.wmnet with reason: REIMAGE
* 20:06 cjming@deploy1002: cjming and arlolra: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1086.eqiad.wmnet with reason: REIMAGE
* 20:06 cjming@deploy1002: Started scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]]
* 16:07 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1012.eqiad.wmnet
* 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 15:56 razzi: stop mariadb on labsdb1012 to reimage and rename to clouddb1021: [[phab:T269211|T269211]]
* 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: REIMAGE
* 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 15:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: REIMAGE
* 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 15:07 elukey: drain + reimage analytics1073 and an-worker1086 to Debian Buster
* 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
* 14:24 cmjohnson@