You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(Urbanecm: Evening SWAT done)
imported>Stashbot
(sukhe: disable puppet on dns4003 till we resolve the puppet failures)
Line 1: Line 1:
== 2019-08-01 ==
== 2022-10-05 ==
* 23:32 Urbanecm: Evening SWAT done
* 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures
* 23:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|819073a}}: Add `autopatrolled` group to az wikisource ([[phab:T229371|T229371]]) (duration: 00m 49s)
* 23:29 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: {{Gerrit|8aca0eb}}: Remove the "autoreview" user group from ru.wikipedia ([[phab:T229596|T229596]]) (duration: 00m 47s)
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cf01272}}: Add importing to english wikiquote ([[phab:T228607|T228607]]) (duration: 00m 48s)
* 23:10 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: [[phab:T229614|T229614]]: Pass proper types to eventlogging to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
* 22:52 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to {{Gerrit|2ee48ab}} (duration: 04m 34s)
* 22:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to {{Gerrit|2ee48ab}}
* 22:17 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/extension.json: [[phab:T229614|T229614]]: Update eventlogging schema version to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
* 22:13 mutante: scandium apt-get autoremove
* 22:13 mutante: scandium apt-get remove --purge wikimedia-lvs-realserver ([[phab:T228069|T228069]])
* 21:48 mutante: scandium - apt-get remove --purge hhvm* ([[phab:T228069|T228069]])
* 21:23 brennen@deploy1001: Synchronized php: group1 and group2 to 1.34.0-wmf.16 (duration: 00m 46s)
* 21:22 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.16
* 20:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/Revision/RevisionRenderer.php: [[phab:T229589|T229589]] - {{Gerrit|3f1b32e4db3698b8}} (duration: 00m 50s)
* 20:47 mutante: scandium - turning into an mw appserver
* 20:46 mutante: puppetmaster: create mcrouter certs for scandium.eqiad.wmnet needed to make it an appserver (https://wikitech.wikimedia.org/wiki/Mcrouter#Generate_certs_for_a_new_host) ([[phab:T228069|T228069]])
* 20:29 bblack: restart pybal on lvs1014
* 19:57 bblack: lvs1016 - restart pybal for slight LVS config change for cloudelastic - [[phab:T224324|T224324]]
* 19:40 brennen@deploy1001: Synchronized php: Revert group1 and group2 back to 1.34.0-wmf.15 (duration: 00m 53s)
* 19:39 twentyafterfour: finished phabricator database dump
* 19:34 bblack: lvs1014 - puppetize and restart pybal for cloudelastic LVS - [[phab:T224324|T224324]]
* 19:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.15
* 19:20 brennen: rolling back to wfm.15 on group1 and group2 while we investigate [[phab:T229575|T229575]]
* 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.16
* 18:52 mutante: scandium (parsoid testing) - added mw application server roles - puppet work / maintenance
* 18:47 mutante: stat1004 - starting nagios-nrpe-server which got killed again - jbd2/md0-8 invoked oom-killer
* 18:32 bblack@puppetmaster1001: conftool action : set/pooled=yes; selector: name=^cloudelastic.*
* 18:30 bblack: lvs1016: puppet re-enabled, pybal restarted, cloudelastic deploy - [[phab:T224324|T224324]]
* 18:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|469c42d}}: Switch testwiki to read sessions from kask, with fallback to redis ([[phab:T222099|T222099]]) (duration: 00m 55s)
* 17:42 bblack: disable puppet on lvs1014 + lvs1016 for cloudelastic LVS merge - [[phab:T224324|T224324]]
* 17:36 twentyafterfour: running db dump on phab1003 (in tmux).  command: sudo ./bin/storage dump --output /srv/dumps/phabricator_db_20190801.sql.gz --compress
* 16:05 XioNoX: power down msw1-codfw
* 15:47 XioNoX: start codfw mgmt work - [[phab:T228112|T228112]]
* 15:40 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
* 15:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
* 15:16 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup ([[phab:T229482|T229482]]) (duration: 01m 14s)
* 15:13 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup ([[phab:T229482|T229482]]) (duration: 01m 20s)
* 14:51 herron: performing rolling restarts of eqiad logstash cluster for security updates
* 14:38 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iaaa1238}} comment-only no-op change (dbctl to 100% of production!) (duration: 00m 55s)
* 14:22 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iaaa1238}} dbctl to 100% of production! (duration: 00m 54s)
* 12:38 jbond42: add cp1008 to canary hosts https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/puppetmaster/frontend.yaml#L22
* 12:18 marostegui: Rename math table on db1089 (enwiki) - [[phab:T196055|T196055]]
* 11:42 Urbanecm: EU SWAT done
* 11:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c51baa3}}: Add files.geocollections.info to the wgCopyUploadsDomains whitelist for commonswiki ([[phab:T229547|T229547]]) (duration: 00m 55s)
* 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1e4458e}}: Add nlm.nih.gov to the wgCopyUploadsDomains whitelist for commonswiki ([[phab:T229470|T229470]]) (duration: 00m 53s)
* 11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c164132}}: Revert "Revert "Switch property terms migration to WRITE_NEW on production wikidata"" ([[phab:T225053|T225053]]) (duration: 00m 55s)
* 11:19 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/ExternalGuidance/: SWAT: {{Gerrit|9402c36}}: Provide the messages in the target language of translation ([[phab:T228019|T228019]]) (duration: 00m 56s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: {{Gerrit|7db98f3}}: flaggedrevs.php: Remove useless wgAddGroups/wgRemoveGroups declarations (duration: 00m 55s)
* 11:05 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: {{Gerrit|aa82657}}: flaggedrevs.php: Allow wikis to remove ability to promote to/demote from autoreview/editor ([[phab:T229346|T229346]]) (duration: 00m 54s)
* 10:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2058 from config [[phab:T229543|T229543]] (duration: 00m 57s)
* 10:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2058 from config [[phab:T229543|T229543]] (duration: 00m 55s)
* 10:12 jbond42: rolling upgrade for patch
* 10:10 _joe_: repooling mw1348 after reimaging as pure-php7
* 07:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2126 into s2 [[phab:T228969|T228969]] (duration: 00m 55s)
* 07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2126 into s2 [[phab:T228969|T228969]] (duration: 00m 54s)
* 07:35 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8844', previous config saved to /var/cache/conftool/dbconfig/20190801-073459-marostegui.json
* 07:29 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1348.eqiad.wmnet
* 07:27 _joe_: removing mw1348 from rotation - reimaging for [[phab:T228976|T228976]]
* 07:10 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8843', previous config saved to /var/cache/conftool/dbconfig/20190801-071022-marostegui.json
* 07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1112 (duration: 00m 54s)
* 06:59 elukey: install python3-docopt manually on lithium to test check_anycast_healthchecker
* 06:51 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1270.eqiad.wmnet
* 06:42 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1270.eqiad.wmnet
* 06:42 _joe_: depooling mw1270 while migrating it to pure-php7
* 06:28 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1348.eqiad.wmnet
* 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1348.eqiad.wmnet
* 06:18 _joe_: depooling mw1348 while moving it to no hhvm support.
* 00:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/resources/Resources.php: {{Gerrit|acfff6751f3b8f7650}} (duration: 00m 54s)
* 00:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/specials/SpecialJavaScriptTest.php: {{Gerrit|acfff6751f3b8f7650}} (duration: 00m 54s)
* 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/resourceloader/ResourceLoader.php: {{Gerrit|acfff6751f3b8f7650}} (duration: 00m 55s)
* 00:28 krinkle@deploy1001: sync-file aborted: composer.json composer.lock dblists debug.json docroot errorpages fc-list fonts images langlist langlist-labs multiversion php php-1.34.0-wmf.13 php-1.34.0-wmf.14 php-1.34.0-wmf.15 php-1.34.0-wmf.16 phpcs.xml phpunit.xml portals private README requirements.txt robots.txt rpc scap setup.py src static test-requirements.txt tests tox.ini typos vendor w wikiversions.json wikiversions-labs.js


== 2019-07-31 ==
== 2022-10-04 ==
* 23:34 eileen: civicrm revision changed from {{Gerrit|218328b29d}} to {{Gerrit|857dcc9461}}, config revision is {{Gerrit|84b785d41c}}
* 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@db795ec]: Update mobileapps to {{Gerrit|b8c4166}} (duration: 04m 21s)
* 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@db795ec]: Update mobileapps to {{Gerrit|b8c4166}}
* 21:28 cjming: end of UTC late backport window
* 23:14 Urbanecm: Evening SWAT done
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:12 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: Add kask session storage configuration. Use only on testwiki, ({{Gerrit|ede989e}}, {{Gerrit|862df8d}}, [[phab:T222099|T222099]]) (duration: 00m 56s)
* 21:25 cjming@deploy1002: Finished scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] (duration: 05m 06s)
* 21:56 ejegg: updated fundraising python tools from {{Gerrit|2a56e5e283}} to {{Gerrit|493a38f9e0}}
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:32 XioNoX: set cr1-eqiad's netflow target port to 2100 (nfacctd)
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:58 brennen@deploy1001: Synchronized php: Revert group1 back to 1.34.0-wmf.15 (duration: 00m 53s)
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:55 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 back to 1.34.0-wmf.15
* 21:21 cjming@deploy1002: cjming and cjming: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:48 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
* 21:20 cjming@deploy1002: Started scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]]
* 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:37 brennen@deploy1001: Synchronized php-1.34.0-wmf.16/skins/MinervaNeue/includes/MinervaHooks.php: [[gerrit:526754{{!}}Limit Recent Changes disable-table mode to Minerva skin]] [[phab:T228280|T228280]] (duration: 00m 56s)
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mdholloway: mobileapps deploy failed, investigating
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to {{Gerrit|5eb9068}} (duration: 01m 39s)
* 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to {{Gerrit|5eb9068}}
* 21:07 cjming@deploy1002: Finished scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] (duration: 05m 40s)
* 20:01 mbsantos@deploy1001: Finished deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to {{Gerrit|529c493}} ([[phab:T227124|T227124]]) (duration: 01m 43s)
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:59 mbsantos@deploy1001: Started deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to {{Gerrit|529c493}} ([[phab:T227124|T227124]])
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:55 ejegg: updated payments-wiki from {{Gerrit|70b432d309}} to {{Gerrit|9533f70fab}}
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:49 mutante: phab1003 - manually running project_changes.sh to create mail to phabricator-reports@lists ([[phab:T228575|T228575]])
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:46 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I45b705c8}} disable dbctl on half of canary hosts (duration: 00m 57s)
* 21:01 cjming@deploy1002: cjming and mdsshakil: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 17:21 volans@deploy1001: Synchronized wmf-config/db-codfw.php: depool db2058, I/O error, [[phab:T229449|T229449]] (duration: 00m 54s)
* 21:01 cjming@deploy1002: Started scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]]
* 17:15 volans@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8841', previous config saved to /var/cache/conftool/dbconfig/20190731-171536-volans.json
* 20:59 cjming@deploy1002: Finished scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] (duration: 06m 35s)
* 16:52 Urbanecm: Morning SWAT done
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:526691{{!}}Enable MobileWebUIActionsTracking schema with 50% sampling rate]] ([[phab:T220016|T220016]]) (duration: 00m 58s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:37 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikimediaEvents/: SWAT: [[:gerrit:526688{{!}}Improved MobileUIActions tracking schema]] ([[phab:T220016|T220016]]) (duration: 00m 54s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:26 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/GrowthExperiments/: SWAT: [[:gerrit:526610{{!}}Only set relevant title on mobile skin]] ([[phab:T229263|T229263]], [[phab:T225659|T225659]]) (duration: 00m 51s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: SWAT: [[:gerrit:526612{{!}}Only set relevant title on mobile skin]] ([[phab:T229263|T229263]], [[phab:T225659|T225659]]) (duration: 00m 56s)
* 20:53 cjming@deploy1002: cjming and trainbranchbot: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:14 bblack: deploying VCL for H/2 coalesce 421 responses - [[phab:T207340|T207340]]
* 20:52 cjming@deploy1002: Started scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]]
* 16:12 marostegui: Poweroff pc2010 for on-site maintenance  [[phab:T227552|T227552]]
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:52 mforns@deploy1001: Finished deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to {{Gerrit|eb2d9b005b26f6dddab2b59f1ba591f1758ec99f}} (duration: 13m 09s)
* 20:49 cjming@deploy1002: Sync cancelled.
* 15:45 bstorm_: restarting nfs service on labstore1004
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:39 mforns@deploy1001: Started deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to {{Gerrit|eb2d9b005b26f6dddab2b59f1ba591f1758ec99f}}
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:24 thcipriani: restarting jenkins for update
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:22 ema: cp-ats: upgrade fifo-log-demux to 0.4 and restart atsmtail@backend.service [[phab:T229414|T229414]]
* 20:42 cjming@deploy1002: cjming and aishik: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 15:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.16
* 20:41 cjming@deploy1002: Started scap: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]]
* 15:15 ema: upload fifo-log-demux 0.4 to stretch-wikimedia [[phab:T229414|T229414]]
* 20:39 cjming@deploy1002: Finished scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] (duration: 14m 29s)
* 15:03 XioNoX: power down re1:cr1-codfw (backup) - [[phab:T226422|T226422]]
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:57 godog: ms-be2018 disablepd 1I:1:1 - [[phab:T225630|T225630]]
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:47 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8838', previous config saved to /var/cache/conftool/dbconfig/20190731-144731-marostegui.json
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1112 (duration: 00m 46s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1078 after upgrade and alter (duration: 00m 47s)
* 20:25 cjming@deploy1002: cjming and d3r1ck01: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:28 herron: beginning rolling reboots of codfw logstash hosts for security updates
* 20:25 cjming@deploy1002: Started scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]]
* 14:28 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8837', previous config saved to /var/cache/conftool/dbconfig/20190731-142814-marostegui.json
* 19:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS buster
* 14:18 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I02d66736}} expand dbctl to 25% of the fleet (duration: 00m 46s)
* 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 14:04 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 14:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1078 after upgrade and alter (duration: 00m 46s)
* 18:34 mutante: gerrit - deploying puppet refactoring change
* 14:01 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8836', previous config saved to /var/cache/conftool/dbconfig/20190731-140124-marostegui.json
* 18:34 tzatziki: removing 1 file for legal compliance
* 13:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1078 after upgrade and alter (duration: 00m 46s)
* 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
* 13:51 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8835', previous config saved to /var/cache/conftool/dbconfig/20190731-135129-marostegui.json
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:49 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 18:24 tzatziki: removing 1 file for legal compliance
* 13:46 ema: cp4021: test fifo-log-demux 0.4 [[phab:T229414|T229414]]
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 herron: beginning rolling restarts of codfw kafka-main brokers for security updates
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 jbond42: rolling update of exim
* 18:21 moritzm: installing gdk-pixbuf security updates
* 13:31 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 18:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 13:27 elukey: roll restart of zookeeper on conf100[4-6] and conf200[1-3] for openjdk upgrades
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 for alter and upgrade (duration: 00m 47s)
* 18:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 marostegui: Upgrade db1078
* 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8834', previous config saved to /var/cache/conftool/dbconfig/20190731-131900-marostegui.json
* 17:59 ejegg: turned fundraising scheduled jobs back on
* 13:15 marostegui: Drop abuse_filter_log.afl_log_id in s3 eqiad - [[phab:T226851|T226851]]
* 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:12 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 17:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] (duration: 06m 58s)
* 13:05 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 17:55 moritzm: installing libsndfile security updates
* 12:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 17:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 12:53 marostegui: Drop abuse_filter_log.afl_log_id from s3 codfw with replication (this will cause lag in s3 codfw) - [[phab:T226851|T226851]]
* 17:50 urbanecm@deploy1002: Started scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]]
* 12:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 17:49 ejegg: turned off fundraising scheduled jobs for civi deploy
* 12:22 Amir1: EU SWAT is done
* 17:28 tzatziki: removing 4 files for legal compliance
* 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526657{{!}}Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 47s)
* 17:04 mutante: gerrit - deployed 832345 - scap and daemon users became decoupled ([[phab:T317412|T317412]])
* 12:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:519212{{!}}Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 47s)
* 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:05 ladsgroup@deploy1001: sync-file aborted: SWAT: [[gerrit:519212{{!}}Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 03s)
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:56 jbond42: enable puppet fleet wide https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645 deployed
* 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:52 kartik@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/ExternalGuidance: SWAT: [[gerrit{{!}}526637{{!}}Provide the messages in the target language of translation (T228019)]] (duration: 00m 46s)
* 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:41 jbond42: disable puppet to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* {{safesubst:SAL entry|1=11:40 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:526646{{!}}Fix typo in name of config (T225055) (duration: 00m 47s)}}
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526197{{!}}Decrease idwiki MT threshold for publishing (T228971)]] (duration: 00m 48s)
* 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:16 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable other statements on Commons (duration: 00m 48s)
* 16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 16:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 10:05 jbond42: rolling back https://gerrit.wikimedia.org/r/q/c9f876e9990fb171f27616515e7d125824d7a6ac
* 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 16:25 brennen@deploy1002: Pruned MediaWiki: 1.40.0-wmf.2 (duration: 02m 02s)
* 09:49 _joe_: pruning orphaned images on contint1001
* 16:24 brennen@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]] (duration: 28m 55s)
* 08:37 elukey: restart Yarn Resource Managers on an-master100[12] to pick up the new openjdk version
* 16:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4003.wikimedia.org with OS bullseye
* 08:06 _joe_: running puppet (and restarting mtail) on all eqiad appservers
* 16:03 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 08:05 elukey: restart hadoop Namenodes on an-master100[12] to pick up new heap settings and new openjdk
* 16:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 07:40 marostegui: Drop abuse_filter_log.afl_log_id in s1 eqiad - [[phab:T226851|T226851]]
* 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8833', previous config saved to /var/cache/conftool/dbconfig/20190731-073608-marostegui.json
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2125 into s2 [[phab:T228969|T228969]] (duration: 00m 47s)
* 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2125 into s2 [[phab:T228969|T228969]] (duration: 00m 49s)
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:29 elukey: restart-hhvm on mw1290
* 15:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS buster
* 07:25 marostegui: Add db2125 to tendril and zarcillo [[phab:T228969|T228969]]
* 15:54 brennen@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 05:44 marostegui: Drop abuse_filter_log.afl_log_id from s1 codfw with replication (this will cause lag in s1 codfw) - [[phab:T226851|T226851]]
* 15:53 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify that db2128 is the new sanitarium master (duration: 00m 47s)
* 15:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 05:00 marostegui: Compress s6 on labsdb1010 - [[phab:T222978|T222978]]
* 15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 04:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/tests/phpunit/includes/parser/ParserOutputTest.php: [[phab:T229366|T229366]] (duration: 00m 46s)
* 15:51 brennen: restarting `/usr/bin/scap stage-train --yes auto` after failed staging ([[phab:T314193|T314193]]), cc: ^demon
* 03:59 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/includes/parser/ParserOutput.php: [[phab:T229366|T229366]] (duration: 00m 47s)
* 15:48 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 02:24 TimStarling: on mwmaint1002 reverted previous change using scap pull
* 15:47 sukhe: disable Puppet on A:cp and A:eqiad for [[phab:T309651|T309651]]
* 01:08 TimStarling: on mwmaint1002, editing wikiversions.json locally to move wikimania2006wiki to .16, to investigate [[phab:T229366|T229366]]
* 15:42 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 00:24 eileen: tools revision changed from {{Gerrit|4910f1507c}} to {{Gerrit|2a56e5e283}}
* 15:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 00:04 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/CentralNotice/: [[phab:T227711|T227711]] among others (duration: 00m 47s)
* 15:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 00:01 catrope@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/CentralNotice/: [[phab:T227711|T227711]] among others (duration: 00m 48s)
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS buster
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 15:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 15:10 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS buster
* 15:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 15:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 15:06 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:02 moritzm: installing snakeyaml security updates
* 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:55 papaul: maintenance complete on msw1-codfw
* 14:51 sukhe: disable Puppet on A:cp and A:esams for [[phab:T309651|T309651]]
* 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 14:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 14:40 moritzm: installing maven-shared-utils security updates
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS buster
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 14:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 14:30 papaul: on going maintenance on msw1-codfw
* 14:29 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 14:27 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 14:22 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:14 XioNoX: netbox - Move VRRP IPs to FHRP group feature - [[phab:T311218|T311218]]
* 14:13 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:12 filippo@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:12 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/tests/phpunit/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (2/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 52s)
* 14:12 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/includes/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (1/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 43s)
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Kartographer/modules/dialog: Backport: [[gerrit:838097{{!}}Log basic nearby and fullscreen events (T315972, T318678)]] (no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 42s)
* 14:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:55 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 13:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 13:49 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35347 and previous config saved to /var/cache/conftool/dbconfig/20221004-134947-root.json
* 13:49 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 13:48 sukhe: disable Puppet on A:cp and A:eqsin for [[phab:T309651|T309651]]
* 13:47 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:42 awight: EU backport window finished.
* 13:40 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 13:38 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 awight@deploy1002: Finished scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] (duration: 06m 49s)
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
* 13:35 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "filippo test - filippo@cumin1001"
* 13:34 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "filippo test - filippo@cumin1001"
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35346 and previous config saved to /var/cache/conftool/dbconfig/20221004-133442-root.json
* 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 13:31 jbond: re-enable puppet post deploy a puppetmaster change 838144
* 13:30 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:30 awight@deploy1002: awight and awight: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 awight@deploy1002: Started scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]]
* 13:28 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:27 awight@deploy1002: Finished scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] (duration: 05m 16s)
* 13:24 jbond: disable puppet to deploy a puppetmaster change 838144
* 13:22 awight@deploy1002: awight and stang: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:21 awight@deploy1002: Started scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]]
* 13:21 awight@deploy1002: Finished scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] (duration: 12m 48s)
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35345 and previous config saved to /var/cache/conftool/dbconfig/20221004-131937-root.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 awight@deploy1002: awight and stang: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:08 awight@deploy1002: Started scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]]
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35343 and previous config saved to /var/cache/conftool/dbconfig/20221004-130432-root.json
* 12:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35342 and previous config saved to /var/cache/conftool/dbconfig/20221004-124927-root.json
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35341 and previous config saved to /var/cache/conftool/dbconfig/20221004-123422-root.json
* 12:31 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 58s)
* 12:30 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 12:26 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 14s)
* 12:26 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:21 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35340 and previous config saved to /var/cache/conftool/dbconfig/20221004-121917-root.json
* 12:14 volans: uploaded python3-gjson_0.1.0 to apt.wikimedia.org bullseye-wikimedia
* 12:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host sessionstore2001.codfw.wmnet with OS buster
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35339 and previous config saved to /var/cache/conftool/dbconfig/20221004-120413-root.json
* 11:55 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 11:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 11:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:05 jayme: published calico 3.23.3 debian packages in bullseye component/calico323 as well as corresponding docker images - [[phab:T307943|T307943]]
* 11:04 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 10:55 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS buster
* 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 10:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9119
* 10:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9119
* 10:41 moritzm: installing expat security updates
* 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart (exit_code=1) rolling restart_daemons on A:maps-codfw
* 09:47 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:42 jayme: deployed istio-ingressgateway with additional envoy native metrics to wikikube codfw and eqiad
* 09:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 09:37 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-codfw
* 09:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 20 hosts
* 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 20 hosts
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35338 and previous config saved to /var/cache/conftool/dbconfig/20221004-093530-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35337 and previous config saved to /var/cache/conftool/dbconfig/20221004-092025-root.json
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35336 and previous config saved to /var/cache/conftool/dbconfig/20221004-090520-root.json
* 08:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35335 and previous config saved to /var/cache/conftool/dbconfig/20221004-085015-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35334 and previous config saved to /var/cache/conftool/dbconfig/20221004-083511-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35333 and previous config saved to /var/cache/conftool/dbconfig/20221004-082005-root.json
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35332 and previous config saved to /var/cache/conftool/dbconfig/20221004-080500-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P35331 and previous config saved to /var/cache/conftool/dbconfig/20221004-080338-root.json
* 07:52 moritzm: installing libdatetime-timezone-perl updates (catching up with latest timezone changes)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35330 and previous config saved to /var/cache/conftool/dbconfig/20221004-074955-root.json
* 07:36 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
* 07:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35329 and previous config saved to /var/cache/conftool/dbconfig/20221004-072158-root.json
* 07:16 elukey: restart kafka on kafka-logging1001 to pick up its new PKI TLS cert
* 07:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json
* 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json
* 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885
* 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 25885
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2019-07-30 ==
== 2022-10-03 ==
* 23:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Enable MobileWebUIActionsTracking schema with 50% sampling rate" ([[phab:T220016|T220016]]) (duration: 00m 47s)
* 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Specify CentralAuth and OAuth session storage separately from per-wiki session storage ([[phab:T227097|T227097]], [[phab:T227696|T227696]]) (duration: 00m 47s)
* 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
* 23:06 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MobileWebUIActionsTracking schema with 50% sampling rate ([[phab:T220016|T220016]]) (duration: 00m 48s)
* 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 22:26 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - [[phab:T226331|T226331]] (duration: 00m 09s)
* 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 22:26 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - [[phab:T226331|T226331]]
* 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
* 22:23 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - [[phab:T226331|T226331]] (duration: 00m 10s)
* 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
* 22:23 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - [[phab:T226331|T226331]]
* 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
* 22:19 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]] (duration: 00m 47s)
* 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
* 22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]]
* 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`); will restart elasticsearch-psi after shards drain}}
* 22:18 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]] (duration: 00m 20s)
* 19:15 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]]
* 18:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 22:15 eileen: tools revision changed from {{Gerrit|8a464c4f0d}} to {{Gerrit|4910f1507c}} (reverted pgmysql switch)
* 18:41 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 22:13 ppchelko@deploy1001: Finished deploy [changeprop/deploy@76b6639]: Report 400 errors by default. [[phab:T229277|T229277]] (duration: 01m 29s)
* 18:34 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 22:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@76b6639]: Report 400 errors by default. [[phab:T229277|T229277]]
* 18:30 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]], take 2, feeds timed out (duration: 01m 03s)
* 18:30 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
* 22:00 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]], take 2, feeds timed out
* 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:00 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]] (duration: 18m 40s)
* 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:42 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]]
* 18:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 19:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.34.0-wmf.15
* 18:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 19:19 mutante: restbase2017 - sudo systemctl start cassandra-b after it had failed for unknown reason
* 18:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 19:19 XioNoX: repool ulsfo
* 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 19:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.16
* 17:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 18:49 XioNoX: rollback vrrp priority changes on cr4-ulsfo
* 17:41 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003
* 18:48 XioNoX: rollback bump cr4-ulsfo<->cr1-codfw ospf metric
* 17:41 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4003
* 18:39 XioNoX: restart cr4-ulsfo
* 17:40 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:38 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 17:37 robh@cumin2002: START - Cookbook sre.dns.netbox
* 18:38 XioNoX: bump cr4-ulsfo<->cr1-codfw ospf metric
* 17:29 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
* 18:26 XioNoX: failover VRRP master to cr3-ulsfo
* 17:29 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 837727: remove dns4001 for anycast neighbors."
* 18:25 XioNoX: activate transit BGP groups on cr3-ulsfo
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4001.wikimedia.org
* 18:25 XioNoX: rollback - bump cr3-ulsfo<->cr2-eqord ospf metric
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:15 XioNoX: restart cr3-ulsfo
* 17:08 robh@cumin2002: START - Cookbook sre.dns.netbox
* 18:15 brennen@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache (duration: 18m 23s)
* 17:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4001.wikimedia.org
* 18:14 XioNoX: bump cr3-ulsfo<->cr2-eqord ospf metric
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:07 XioNoX: deactivate transit BGP groups on cr3-ulsfo
* 16:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:06 XioNoX: failover VRRP master to cr4-ulsfo
* 16:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:56 brennen@deploy1001: Started scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:55 brennen@deploy1001: Pruned MediaWiki: 1.34.0-wmf.11 (duration: 07m 40s)
* 16:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
* 17:53 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@af8b471]: Update mobileapps to {{Gerrit|ec865a7}} (duration: 05m 45s)
* 16:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
* 17:47 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@af8b471]: Update mobileapps to {{Gerrit|ec865a7}}
* 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:20 XioNoX: depool ulsfo for routers upgrades - [[phab:T227886|T227886]]
* 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:15 godog: use wezen.codfw.wmnet instead of syslog.codfw.wmnet for production hosts
* 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:00 thcipriani: gerrit restart incoming -- gc time increasing causing timeouts
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:46 XioNoX: adding port 9105 to term prometheus in filter labs-in4 - [[phab:T225296|T225296]]
* 16:24 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] (duration: 04m 16s)
* 16:41 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Icf57a2ab}} enable dbctl on all mw canaries (duration: 00m 47s)
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:37 brennen: cutting 1.34-wmf.16
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:33 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 16:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:22 godog: bounce rsyslog on centrallog1001 - [[phab:T199406|T199406]]
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:41 elukey@cumin1001: END (FAIL) - Cookbook sre.kafka.roll-restart-brokers (exit_code=99)
* 16:20 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 15:28 legoktm@deploy1001: Finished scap: Rebuild l10n cache for SecureLinkFixer message (duration: 18m 51s)
* 16:20 urbanecm@deploy1002: Started scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]]
* 15:21 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 16:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cae49b85d2d780e34b553789d56d76bac4a62c48}}: throttle: Add throttle rule for 2022-10-06 ([[phab:T319212|T319212]]) (duration: 04m 21s)
* 15:18 jijiki: Disable puppet on  mw1347 and mw2136, depool and pool back -  [[phab:T219150|T219150]]
* 16:14 sukhe: disable Puppet on cp hosts in codfw: rolling out [[phab:T309651|T309651]]
* 15:13 elukey: remove snakebite from buster-wikimedia (not needed anymore)
* 15:15 sukhe: disable Puppet on cp hosts in ulsfo: rolling out [[phab:T309651|T309651]]
* 15:09 legoktm@deploy1001: Started scap: Rebuild l10n cache for SecureLinkFixer message
* 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35320 and previous config saved to /var/cache/conftool/dbconfig/20221003-151438-root.json
* 15:06 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer everywhere ([[phab:T200751|T200751]]) (duration: 00m 47s)
* 15:06 papaul: maintenance complete on mr1-esams
* 14:48 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I17c55428}} dbctl canary on mwdebug*, mw1261, mw1276 (duration: 00m 47s)
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35319 and previous config saved to /var/cache/conftool/dbconfig/20221003-145933-root.json
* 14:36 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ie98a8d9e}} dbctl canary on mwdebug1001 (duration: 00m 47s)
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35318 and previous config saved to /var/cache/conftool/dbconfig/20221003-144428-root.json
* 14:34 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ie98a8d9e}} dbctl canary on mwdebug1001 (duration: 00m 47s)
* 14:35 sukhe: upgrade A:cp and A:drmrs to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 14:33 cdanis@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|Ie98a8d9e}} dbctl canary on mwdebug1001 (duration: 00m 48s)
* 14:31 papaul: on going maintenance on mr1-esams
* 14:14 fsero: refreshing calico policy from code in eqiad
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35317 and previous config saved to /var/cache/conftool/dbconfig/20221003-142923-root.json
* 14:13 fsero: refreshing calico policy from code in codfw
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35316 and previous config saved to /var/cache/conftool/dbconfig/20221003-141417-root.json
* 13:38 marostegui: Move db2094:3315 from db2066 to db2128 - [[phab:T228258|T228258]]
* 14:08 sukhe: upgrade cp4026, cp4032 to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 13:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35315 and previous config saved to /var/cache/conftool/dbconfig/20221003-135912-root.json
* 13:13 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 13:57 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm2_amd64.changes: [[phab:T309651|T309651]]
* 12:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8824', previous config saved to /var/cache/conftool/dbconfig/20190730-123630-marostegui.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35314 and previous config saved to /var/cache/conftool/dbconfig/20221003-134407-root.json
* 12:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35313 and previous config saved to /var/cache/conftool/dbconfig/20221003-134024-root.json
* 12:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35312 and previous config saved to /var/cache/conftool/dbconfig/20221003-132902-root.json
* 12:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35311 and previous config saved to /var/cache/conftool/dbconfig/20221003-132519-root.json
* 12:13 jbond42: while testing some changes on the puppet master a bad config caused a small blip in catalouge compilation
* 13:18 vgutierrez: enforcing origin-form{{!}}asterisk-form for request-target on varnish (could trigger spikes of HTTP 400 errors) - [[phab:T318676|T318676]]
* 12:09 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35310 and previous config saved to /var/cache/conftool/dbconfig/20221003-131014-root.json
* 11:34 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35308 and previous config saved to /var/cache/conftool/dbconfig/20221003-125509-root.json
* 11:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35307 and previous config saved to /var/cache/conftool/dbconfig/20221003-124004-root.json
* 11:30 jijiki: Depool  mw1348 and pool back
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35306 and previous config saved to /var/cache/conftool/dbconfig/20221003-122459-root.json
* 11:28 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35305 and previous config saved to /var/cache/conftool/dbconfig/20221003-120954-root.json
* 09:49 elukey: upload python-snakebite to buster-wikimedia (rebuilt for buster from source)
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P35303 and previous config saved to /var/cache/conftool/dbconfig/20221003-120208-root.json
* 09:31 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 09:27 elukey: add thirdparty/cloudera to buster-wikimedia and import packages to it (pull from the jessie component)
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 08:17 marostegui: Stop MySQL on db2038 [[phab:T227565|T227565]]
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 08:10 marostegui: Remove db2038 from tendril and zarcillo [[phab:T227565|T227565]]
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 08:04 akosiaris: delete orespoolcounter{1,2}00{1,2} [[phab:T227640|T227640]]
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35302 and previous config saved to /var/cache/conftool/dbconfig/20221003-115449-root.json
* 08:04 akosiaris: revoke and deactivate orespoolcounter{1,2}00{1,2} [[phab:T227640|T227640]]
* 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 07:30 godog: bounce hhvm on mw1221
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 05:36 marostegui: Disable puppet on cumin2001 to investigate a backups issue
* 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 05:25 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/jobqueue/jobs/AssembleUploadChunksJob.php: [[phab:T228929|T228929]] (duration: 00m 46s)
* 11:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 05:24 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/api/ApiUpload.php: [[phab:T228929|T228929]] (duration: 00m 47s)
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS buster
* 05:23 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/upload/UploadBase.php: [[phab:T228929|T228929]] (duration: 00m 48s)
* 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s8 ready only [[phab:T227062|T227062]] (duration: 00m 24s)
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s8 master eqiad from db1071 to db1104  [[phab:T227062|T227062]] (duration: 00m 24s)
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s8 on read-only [[phab:T227062|T227062]] (duration: 00m 26s)
* 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 05:00 marostegui: Starting s8 failover from db1071 to db1104 - [[phab:T227062|T227062]]
* 10:52 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS buster
* 04:48 eileen: civicrm revision changed from {{Gerrit|1d57aca19c}} to {{Gerrit|218328b29d}}, config revision is {{Gerrit|3f960c48f6}}
* 10:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 04:15 marostegui: Start pre-steps for s8 primary master failover - [[phab:T227062|T227062]]
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 02:37 eileen: civicrm revision changed from {{Gerrit|121feb5d53}} to {{Gerrit|1d57aca19c}}, config revision is {{Gerrit|3f960c48f6}}
* 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 10:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS buster
* 10:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 10:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 10:39 hnowlan: starting cassandra on reimaged sessionstore1002
* 10:37 _joe_: remove stale druid.svc.eqiad.wmnet certificate from the puppetmaster CA; it was expired anyways
* 10:32 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 10:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS buster
* 10:00 hnowlan: c-foreach-nt drain on sessionstore1002
* 10:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 10:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35300 and previous config saved to /var/cache/conftool/dbconfig/20221003-092519-root.json
* 09:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31133
* 09:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31133
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62044
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62044
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35299 and previous config saved to /var/cache/conftool/dbconfig/20221003-091014-root.json
* 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P35297 and previous config saved to /var/cache/conftool/dbconfig/20221003-085840-root.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35296 and previous config saved to /var/cache/conftool/dbconfig/20221003-085509-root.json
* 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12975
* 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12975
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35295 and previous config saved to /var/cache/conftool/dbconfig/20221003-085007-root.json
* 08:40 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5001.eqsin.wmnet
* 08:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35294 and previous config saved to /var/cache/conftool/dbconfig/20221003-084004-root.json
* 08:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3303
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35293 and previous config saved to /var/cache/conftool/dbconfig/20221003-083729-root.json
* 08:36 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
* 08:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35292 and previous config saved to /var/cache/conftool/dbconfig/20221003-083502-root.json
* 08:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5001.eqsin.wmnet
* 08:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15557
* 08:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15557
* 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12975
* 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12975
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35291 and previous config saved to /var/cache/conftool/dbconfig/20221003-082459-root.json
* 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30781
* 08:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30781
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35290 and previous config saved to /var/cache/conftool/dbconfig/20221003-082224-root.json
* 08:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39386
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35289 and previous config saved to /var/cache/conftool/dbconfig/20221003-081955-root.json
* 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39386
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35288 and previous config saved to /var/cache/conftool/dbconfig/20221003-080954-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35287 and previous config saved to /var/cache/conftool/dbconfig/20221003-080719-root.json
* 08:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35286 and previous config saved to /var/cache/conftool/dbconfig/20221003-080556-root.json
* 08:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35285 and previous config saved to /var/cache/conftool/dbconfig/20221003-080451-root.json
* 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178', diff saved to https://phabricator.wikimedia.org/P35284 and previous config saved to /var/cache/conftool/dbconfig/20221003-075643-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35283 and previous config saved to /var/cache/conftool/dbconfig/20221003-075449-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35282 and previous config saved to /var/cache/conftool/dbconfig/20221003-075214-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35281 and previous config saved to /var/cache/conftool/dbconfig/20221003-075051-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35280 and previous config saved to /var/cache/conftool/dbconfig/20221003-074946-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16637
* 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16637
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35279 and previous config saved to /var/cache/conftool/dbconfig/20221003-073944-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35278 and previous config saved to /var/cache/conftool/dbconfig/20221003-073709-root.json
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 XioNoX: cr2-drmrs# set chassis fpc 0 sampling-instance pmacct
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35277 and previous config saved to /var/cache/conftool/dbconfig/20221003-073627-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200', diff saved to https://phabricator.wikimedia.org/P35276 and previous config saved to /var/cache/conftool/dbconfig/20221003-073556-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35275 and previous config saved to /var/cache/conftool/dbconfig/20221003-073546-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35274 and previous config saved to /var/cache/conftool/dbconfig/20221003-073441-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35273 and previous config saved to /var/cache/conftool/dbconfig/20221003-072741-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35272 and previous config saved to /var/cache/conftool/dbconfig/20221003-072204-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35271 and previous config saved to /var/cache/conftool/dbconfig/20221003-072122-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35270 and previous config saved to /var/cache/conftool/dbconfig/20221003-072041-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35269 and previous config saved to /var/cache/conftool/dbconfig/20221003-071936-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35268 and previous config saved to /var/cache/conftool/dbconfig/20221003-071236-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35267 and previous config saved to /var/cache/conftool/dbconfig/20221003-070659-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35266 and previous config saved to /var/cache/conftool/dbconfig/20221003-070617-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35265 and previous config saved to /var/cache/conftool/dbconfig/20221003-070536-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35264 and previous config saved to /var/cache/conftool/dbconfig/20221003-070431-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P35263 and previous config saved to /var/cache/conftool/dbconfig/20221003-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35262 and previous config saved to /var/cache/conftool/dbconfig/20221003-065731-root.json
* 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6128
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35261 and previous config saved to /var/cache/conftool/dbconfig/20221003-065154-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35260 and previous config saved to /var/cache/conftool/dbconfig/20221003-065112-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35259 and previous config saved to /var/cache/conftool/dbconfig/20221003-065031-root.json
* 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6128
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P35258 and previous config saved to /var/cache/conftool/dbconfig/20221003-064638-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35257 and previous config saved to /var/cache/conftool/dbconfig/20221003-064226-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35256 and previous config saved to /var/cache/conftool/dbconfig/20221003-063607-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35255 and previous config saved to /var/cache/conftool/dbconfig/20221003-063527-root.json
* 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11039
* 06:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11039
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35254 and previous config saved to /var/cache/conftool/dbconfig/20221003-062721-root.json
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
* 06:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35253 and previous config saved to /var/cache/conftool/dbconfig/20221003-062102-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35252 and previous config saved to /var/cache/conftool/dbconfig/20221003-062022-root.json
* 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
* 06:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35251 and previous config saved to /var/cache/conftool/dbconfig/20221003-061216-root.json
* 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35250 and previous config saved to /var/cache/conftool/dbconfig/20221003-060557-root.json
* 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35249 and previous config saved to /var/cache/conftool/dbconfig/20221003-055711-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P35248 and previous config saved to /var/cache/conftool/dbconfig/20221003-055401-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35247 and previous config saved to /var/cache/conftool/dbconfig/20221003-055052-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P35246 and previous config saved to /var/cache/conftool/dbconfig/20221003-054245-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35245 and previous config saved to /var/cache/conftool/dbconfig/20221003-054206-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P35244 and previous config saved to /var/cache/conftool/dbconfig/20221003-052927-root.json


== 2019-07-29 ==
== 2022-10-02 ==
* 23:37 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ams
* 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition
* 23:36 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: Make welcome and discovery tours fully mutually exclusive ([[phab:T229044|T229044]]) (duration: 00m 48s)
* 23:26 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ulsfo
* 23:22 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in Dallas
* 22:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/MessageCache.php: [[phab:T208897|T208897]] - {{Gerrit|fa817b088e43975}} (duration: 00m 47s)
* 22:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: [[phab:T214674|T214674]] - {{Gerrit|bfcaf0c26d6}} (duration: 00m 48s)
* 22:28 XioNoX: roll out anycast DNS and syslog to all network devices - [[phab:T228190|T228190]]
* 22:16 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: [[phab:T214674|T214674]] - {{Gerrit|940955ea3844721a0}} (duration: 00m 48s)
* 22:05 XioNoX: replace ulsfo network devices' DNS target with 10.3.0.1
* 22:00 Krinkle: krinkle@deploy1001: Dirty git status on extensions/AbusesFilter and extensions/CheckUser in php-1.34.0-wmf.15
* 21:43 XioNoX: replace ulsfo network devices' syslog target with syslog.anycast.wmnet
* 19:22 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c3ffbee]: Weekly deploy (duration: 11m 42s)
* 19:10 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c3ffbee]: Weekly deploy
* 18:23 Urbanecm: Morning SWAT done
* 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:526196{{!}}Rename Image-reviewer to image-reviewer on fawiki]] ([[phab:T216406|T216406]]) (duration: 00m 47s)
* 18:19 Urbanecm: Run mwscript migrateUserGroup.php --wiki=fawiki Image-reviewer image-reviewer ([[phab:T216406|T216406]])
* 18:18 XioNoX: switch traffic to the GTT link between Ashburn and Amsterdam (set GTT metric to 820 vs. 1820 before) - [[phab:T228827|T228827]]
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:430627{{!}}Add several rights to eliminators in fawiki]] ([[phab:T176553|T176553]], 2/2) (duration: 00m 47s)
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: [[:gerrit:430627{{!}}Add several rights to eliminators in fawiki]] ([[phab:T176553|T176553]], 1/2) (duration: 00m 47s)
* 18:04 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter: SWAT: [[:gerrit:525598{{!}}Initialize user-defined variables during shortcircuit]] ([[phab:T214674|T214674]]) (duration: 00m 49s)
* 17:37 ejegg: updated payments-wiki config to {{Gerrit|a7dacbf8e9}}
* 17:08 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-anycast-healthchecker
* 17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-json-logger
* 17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia anycast-healthchecker
* 16:47 godog: add anycast syslog to wezen/centrallog1001
* 16:19 elukey: manually stopped the sre.kafka.roll-restart-brokers cookbook after 4 brokers restarts since the sleep interval (10mins) is too tight.
* 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97)
* 15:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Retry - Produce resource_change stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 46s)
* 15:34 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 15:30 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce resource_change stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 47s)
* 14:35 papaul: shutting down pc2010 for maintenance
* 13:57 cdanis@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8816', previous config saved to /var/cache/conftool/dbconfig/20190729-135730-cdanis.json
* 13:30 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 13:28 marostegui: Stop MySQL on pc2010 - [[phab:T227552|T227552]]
* 13:23 arturo: [[phab:T228870|T228870]] reboot cloudvirt1007.eqiad.wmnet for kernel updates
* 13:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:23 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 13:09 arturo: [[phab:T228870|T228870]] reboot cloudvirt1006.eqiad.wmnet for kernel updates
* 13:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:09 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 13:01 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 12:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2128 into s5 api [[phab:T221533|T221533]] (duration: 00m 47s)
* 12:45 marostegui: Provision db2128 into s5 codfw - [[phab:T228969|T228969]]
* 12:44 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2128 into s5 api [[phab:T221533|T221533]] (duration: 00m 47s)
* 12:39 arturo: [[phab:T228870|T228870]] reboot cloudvirt1005.eqiad.wmnet for kernel updates
* 12:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 12:20 arturo: [[phab:T228870|T228870]] reboot cloudvirt1004.eqiad.wmnet for kernel updates
* 12:20 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:20 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 arturo: [[phab:T228870|T228870]] reboot cloudvirt1003.eqiad.wmnet for kernel updates
* 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:36 arturo: icinga downtime toolschecker for 6h
* 11:31 arturo: [[phab:T228870|T228870]] reboot cloudvirt1002.eqiad.wmnet for kernel updates
* 11:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:14 arturo: [[phab:T228870|T228870]] reboot cloudvirt1001.eqiad.wmnet for kernel updates
* 11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:13 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:11 dcausse: EU SWAT done
* 11:10 dcausse@deploy1001: Synchronized wmf-config/SearchSettingsForWikidata.php: [cirrus] Use correct factory declaration for EntityFullTextQueryBuilder (duration: 00m 47s)
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:526125{{!}} Bumping portals to master (T128546)]] (duration: 00m 47s)
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:526125{{!}} Bumping portals to master (T128546)]] (duration: 00m 47s)
* 09:49 marostegui: Add db2128 to tendril and zarcillo - [[phab:T228969|T228969]]
* 09:24 elukey@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99)
* 09:22 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 08:55 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 08:51 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 08:47 elukey: set mcrouter async behavior for codfw replication to all mw app/api servers (changes will be picked up when puppet runs on the hosts) - [[phab:T225642|T225642]]
* 08:35 godog: temp stop puppet on cp hosts to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/525259
* 08:32 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
* 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 08:16 marostegui: Drop abuse_filter_log.afl_log_id in s7 eqiad - [[phab:T226851|T226851]]
* 07:49 dcausse: elastic@eqiad force recovery of failed shards (eswiki stuck)
* 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2038 from config [[phab:T221533|T221533]] (duration: 00m 46s)
* 07:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2038 from config [[phab:T221533|T221533]] (duration: 00m 50s)
* 07:18 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 06:45 akosiaris: poweroff orespoolcounter{1,2}00{1,2} for removal [[phab:T227640|T227640]]
* 06:37 _joe_: restarted php7.2 on mwdebug1002, low opcache
* 06:36 _joe_: restarted coherence report on netmon1002, it failed earlier this morning
* 06:31 _joe_: restarting nrpe on restbase-dev1006 [[phab:T224260|T224260]]
* 06:30 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 in preparation for Tuesday 30th failover in s8 (duration: 00m 54s)
* 05:18 marostegui: Drop Drop abuse_filter_log.afl_log_id from s7 codfw with replication (this will cause lag in s7 codfw) - [[phab:T226851|T226851]]
* 05:05 marostegui: Remove db1072 from tendril and zarcillo [[phab:T228956|T228956]]


== 2019-07-28 ==
== 2022-10-01 ==
* 15:13 arturo: disable 1m load average check in icinga for labstore1007 for 24h
* 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
* 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
* 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
* 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)


== 2019-07-27 ==
== 2022-09-30 ==
* 17:39 bd808: Updated profile & images for @wikimediatech twitter account
* 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:49 godog: bounce rsyslog on wezen / centrallog1001
* 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:43 elukey: powercycle mw1300 - no ssh, serial com2 stuck with no root loging available
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
* 00:35 mutante: restbase-dev1006 - starting nagios-nrpe-server
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
* 00:33 mutante: wikitech-static - fix /etc/letsencrypt/renewal/wikitech-static.wikimedia.org.conf - remove webroot_map and and line for status.wm.org that caused errors when doing a renewal dry-run. now dry run finishes succesfully and we are using "webroot" authenticator and not "apache" anymore. This should have resolved what this ticket was about. No more Apache kills/restarts on renewal. ([[phab:T214640|T214640]])
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
* 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
* 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
* 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 13:51 moritzm: installing puppetdb-test2001 [[phab:T318931|T318931]]
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
* 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
* 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
* 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
* 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
* 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
* 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
* 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
* 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
* 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
* 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
* 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
* 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
* 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye


== 2019-07-26 ==
== 2022-09-29 ==
* 23:51 mutante: restbase-dev1006 - manually booting into PXE to debug boot issue / start Debian installer ([[phab:T224260|T224260]])
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 23:27 mutante: restbase-dev1006 - does not boot - hangs at "attempting to boot from C:" - entering "Legacy BIOS One Time Boot Menu" ([[phab:T224260|T224260]])
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 21:52 mutante: restbase-dev1006 - power reset via mgmt
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 20:48 mutante: restbase-dev1006 - rebooting from busybox shell where it was idling since a failed reimage attempt
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 20:22 foks: reset password for Sharons36
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 18:43 XioNoX: remove lvs100[1-6] switch config from asw2-d-eqiad - [[phab:T224223|T224223]]
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:33 mutante: deploy2001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:32 mutante: deploy1001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
* 21:43 sukhe: alert1001: restart icinga
* 18:20 XioNoX: remove lvs100[1-6] switch config from asw2-c-eqiad - [[phab:T224223|T224223]]
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 XioNoX: remove lvs100[1-6] switch config from asw2-b-eqiad - [[phab:T224223|T224223]]
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:01 XioNoX: remove lvs100[1-6] switch config from asw2-a-eqiad - [[phab:T224223|T224223]]
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:37 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 21:26 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 16:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow/includes/Search/Iterators/TopicIterator.php: [[phab:T229114|T229114]] make orderUUID public, as it is needed by other classes for Dumps (duration: 00m 47s)
* 21:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 15:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|de0822497919b1b}} (duration: 00m 48s)
* 21:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:02 Krinkle: krinkle@deploy1001: php-1.34.0-wmf.15 is still dirty on extensions/CheckUser
* 21:18 ejegg: payments-wiki upgraded from {{Gerrit|839d6dde}} to {{Gerrit|aeee9676}}
* 14:23 ema: re-enable puppet on cache nodes [[phab:T229091|T229091]]
* 21:14 robh@cumin2002: START - Cookbook sre.dns.netbox
* 14:10 ema: disable puppet on cache nodes [[phab:T229091|T229091]]
* 21:14 brennen: end of utc late backport and config window
* 13:41 fsero: sudo -i reprepro --ignore=wrongdistribution include stretch-wikimedia /home/fsero/envoyproxy_1.11.0~wmf1_amd64.changes
* 21:14 brennen@deploy1002: Finished scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] (duration: 08m 22s)
* 13:41 jeh: updated labstore100[67].wikimedia.org performance scaling_governor [[phab:T225713|T225713]]
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:07 jeh: rebooting labstore1006.wikimedia.org for updates [[phab:T224228|T224228]]
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:00 Urbanecm: Change user email assigned to SUL user Stansfield ([[phab:T229004|T229004]])
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:45 jeh: rebooting labsdb1012.eqiad.wmnet for updates [[phab:T224228|T224228]]
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 vslow [[phab:T221533|T221533]] (duration: 00m 50s)
* 21:06 brennen@deploy1002: brennen and ebernhardson: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 [[phab:T228969|T228969]] (duration: 00m 47s)
* 21:05 brennen@deploy1002: Started scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2123 into s5 [[phab:T228969|T228969]] (duration: 00m 48s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:42 marostegui: Add db2123 to tendril and zarcillo - [[phab:T228969|T228969]]
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 47s)
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 47s)
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 46s)
* 20:59 ryankemper: [[phab:T313431|T313431]] Repooled `elastic[2073-2074,2080-2081,2083,2086].codfw.wmnet`. Codfw's all on 5 masters now and cluster is back to green.
* 05:40 marostegui: Stop MySQL on db1072 to get it ready for decommission - [[phab:T228956|T228956]]
* 20:58 brennen@deploy1002: Sync cancelled.
* 05:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 48s)
* 20:58 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 05:05 marostegui: Stop MySQL on db1096 for upgrade
* 20:58 ryankemper: [[phab:T313431|T313431]] Updated cross-cluster seed conf with new masters; should resolve the settings check alerts
* 05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 00m 49s)
* 20:58 brennen@deploy1002: Started scap: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]]
* 00:53 ejegg: re-enabled dedupe_civicrm_contacts and major_gifts_addresses fundraising jobs
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4027.ulsfo.wmnet
* 00:51 ejegg: re-enabled donations queue consumer
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:15 ejegg: disabled donations queue consumer
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:52 brennen@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.3" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.gcoIZ0BTKW"' returned non-zero exit status 255. (duration: 00m 00s)
* 20:52 brennen@deploy1002: Started scap: Backport for [[gerrit:836886{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 20:49 robh@cumin2002: START - Cookbook sre.dns.netbox
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 brennen@deploy1002: Sync cancelled.
* 20:45 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:45 brennen@deploy1002: Started scap: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]]
* 20:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 20:42 brennen@deploy1002: Sync cancelled.
* 20:41 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:41 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2080` to pick up new master-eligible status
* 20:41 brennen@deploy1002: Started scap: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]]
* 20:38 brennen@deploy1002: Finished scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] (duration: 08m 03s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:35 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 20:35 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cp4027.ulsfo.wmnet
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 20:30 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:30 brennen@deploy1002: Started scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]]
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] (duration: 08m 05s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:19 hoo: Ran foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php
* 20:17 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}} (with cache prefix altered for moved classes)
* 20:17 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2086` to pick up new master-eligible status
* 20:17 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:17 brennen@deploy1002: Started scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]]
* 20:04 ejegg: payments-wiki rolled back from {{Gerrit|839d6dde}} to {{Gerrit|0456850e}}
* 19:56 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}}
* 19:55 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic208[1,3]` to pick up new master-eligible status
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 19:33 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic207[3,4]` to pick up new master-eligible status
* 19:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 19:29 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4021.ulsfo.wmnet
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:04 robh@cumin2002: START - Cookbook sre.dns.netbox
* 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4021.ulsfo.wmnet
* 18:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:06 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35188 and previous config saved to /var/cache/conftool/dbconfig/20220929-162812-ladsgroup.json
* 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35187 and previous config saved to /var/cache/conftool/dbconfig/20220929-162750-ladsgroup.json
* 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35186 and previous config saved to /var/cache/conftool/dbconfig/20220929-161244-ladsgroup.json
* 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35185 and previous config saved to /var/cache/conftool/dbconfig/20220929-155737-ladsgroup.json
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:836858{{!}}Configure `mul` Wikibase language code on Beta wikis]] (beta-only, prod noop) (duration: 03m 41s)
* 15:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35184 and previous config saved to /var/cache/conftool/dbconfig/20220929-154231-ladsgroup.json
* 15:35 dancy@deploy1002: Installation of scap version "4.25.0" completed for 561 hosts
* 15:35 dancy@deploy1002: Installing scap version "4.25.0" for 561 hosts
* 14:30 moritzm: installing glib2.0 security updates
* 14:29 moritzm: uploaded glib2.0 2.50.3-2+deb9u3+wmf1  to apt.wikimedia.org/stretch-wikimedia
* 14:17 moritzm: rolling restart of apache2 in mw/eqiad to pick up Expat security updates
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11164
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11164
* 13:54 claime: Enabled puppet for C:memcache hosts following merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 32934
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35179 and previous config saved to /var/cache/conftool/dbconfig/20220929-134844-root.json
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:46 claime: Disabling puppet for C:memcache hosts to merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 13:41 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] (duration: 06m 13s)
* 13:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
* 13:39 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
* 13:35 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and hoo: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:34 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]]
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35178 and previous config saved to /var/cache/conftool/dbconfig/20220929-133339-root.json
* 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] (duration: 05m 36s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kemayo: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]]
* 13:26 moritzm: restartting Apache on lists
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] (duration: 05m 23s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35176 and previous config saved to /var/cache/conftool/dbconfig/20220929-131834-root.json
* 13:15 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:15 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]]
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35175 and previous config saved to /var/cache/conftool/dbconfig/20220929-131507-root.json
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 moritzm: rolling restart of apache2 in mw/codfw to pick up Expat security updates
* 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835291{{!}}votewiki: Change wgLanguageCode to zh for Sep 2022 admins election (T318147)]] (duration: 03m 40s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35174 and previous config saved to /var/cache/conftool/dbconfig/20220929-130329-root.json
* 13:01 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 04m 04s)
* 13:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35173 and previous config saved to /var/cache/conftool/dbconfig/20220929-130003-root.json
* 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:57 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35172 and previous config saved to /var/cache/conftool/dbconfig/20220929-124824-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35171 and previous config saved to /var/cache/conftool/dbconfig/20220929-124458-root.json
* 12:44 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] (duration: 09m 05s)
* 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:35 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 12:34 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35169 and previous config saved to /var/cache/conftool/dbconfig/20220929-123319-root.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35168 and previous config saved to /var/cache/conftool/dbconfig/20220929-122953-root.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35167 and previous config saved to /var/cache/conftool/dbconfig/20220929-121814-root.json
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35166 and previous config saved to /var/cache/conftool/dbconfig/20220929-121448-root.json
* 12:10 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 12:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3292
* 12:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3292
* 12:04 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35165 and previous config saved to /var/cache/conftool/dbconfig/20220929-120309-root.json
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35164 and previous config saved to /var/cache/conftool/dbconfig/20220929-115943-root.json
* 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
* 11:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P35163 and previous config saved to /var/cache/conftool/dbconfig/20220929-115612-root.json
* 11:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 209453
* 11:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 209453
* 11:51 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15695
* 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15695
* 11:45 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 42
* 11:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
* 11:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3856
* 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35162 and previous config saved to /var/cache/conftool/dbconfig/20220929-114438-root.json
* 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35161 and previous config saved to /var/cache/conftool/dbconfig/20220929-114431-ladsgroup.json
* 11:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3856
* 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42
* 11:41 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 11:40 ladsgroup@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
* 11:39 ladsgroup@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 11:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62955
* 11:38 ladsgroup@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:38 ladsgroup@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 11:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62955
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35160 and previous config saved to /var/cache/conftool/dbconfig/20220929-112933-root.json
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35159 and previous config saved to /var/cache/conftool/dbconfig/20220929-112925-ladsgroup.json
* 11:16 XioNoX: re-pool cr2-eqord - [[phab:T295690|T295690]]
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35158 and previous config saved to /var/cache/conftool/dbconfig/20220929-111418-ladsgroup.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35157 and previous config saved to /var/cache/conftool/dbconfig/20220929-111217-root.json
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 codfw primary [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35156 and previous config saved to /var/cache/conftool/dbconfig/20220929-111127-root.json
* 11:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - [[phab:T318892|T318892]]
* 11:06 XioNoX: restart cr2-eqord for upgrade - [[phab:T295690|T295690]]
* 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
* 11:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
* 11:01 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35155 and previous config saved to /var/cache/conftool/dbconfig/20220929-105912-ladsgroup.json
* 10:53 XioNoX: drain cr2-eqord - [[phab:T295690|T295690]]
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35154 and previous config saved to /var/cache/conftool/dbconfig/20220929-105206-root.json
* 10:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s8 [[phab:T318892|T318892]]
* 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s8 [[phab:T318892|T318892]]
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318892|T318892]]
* 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
* 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318892|T318892]]
* 10:50 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
* 10:40 XioNoX: repool cr2-eqiad - [[phab:T295690|T295690]]
* 10:36 moritzm: installing poppler security updates
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35153 and previous config saved to /var/cache/conftool/dbconfig/20220929-100849-ladsgroup.json
* 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35152 and previous config saved to /var/cache/conftool/dbconfig/20220929-100828-ladsgroup.json
* 10:07 XioNoX: second (and longest) cr2-eqiad RE switchover - [[phab:T295690|T295690]]
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35150 and previous config saved to /var/cache/conftool/dbconfig/20220929-095321-ladsgroup.json
* 09:45 moritzm: restarting superset to pick up expat security update
* 09:43 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 09:42 XioNoX: first cr2-eqiad RE switchover - [[phab:T295690|T295690]]
* 09:41 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 09:38 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35149 and previous config saved to /var/cache/conftool/dbconfig/20220929-093815-ladsgroup.json
* 09:36 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 09:34 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:33 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:33 XioNoX: drain cr2-eqiad - [[phab:T295690|T295690]]
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
* 09:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
* 09:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:26 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2098.codfw.wmnet with OS bullseye
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35148 and previous config saved to /var/cache/conftool/dbconfig/20220929-092308-ladsgroup.json
* 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:16 XioNoX: repool cr1-eqiad - [[phab:T295690|T295690]]
* 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.3"
* 09:07 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
* 09:04 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
* 08:52 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2098.codfw.wmnet with OS bullseye
* 08:43 XioNoX: second cr1-eqiad RE switchover - [[phab:T295690|T295690]]
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35146 and previous config saved to /var/cache/conftool/dbconfig/20220929-082757-root.json
* 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:15 XioNoX: first cr1-eqiad RE switchover (for NVM firmware) - [[phab:T295690|T295690]]
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35145 and previous config saved to /var/cache/conftool/dbconfig/20220929-081252-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35144 and previous config saved to /var/cache/conftool/dbconfig/20220929-080340-root.json
* 07:57 XioNoX: drain traffic away from cr1-eqiad - [[phab:T295690|T295690]]
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35143 and previous config saved to /var/cache/conftool/dbconfig/20220929-075747-root.json
* 07:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
* 07:49 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35142 and previous config saved to /var/cache/conftool/dbconfig/20220929-074835-root.json
* 07:45 moritzm: installing expat security updates
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35141 and previous config saved to /var/cache/conftool/dbconfig/20220929-074242-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 18106
* 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 18106
* 07:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38040
* 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38040
* 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35280
* 07:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35280
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35140 and previous config saved to /var/cache/conftool/dbconfig/20220929-073330-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35139 and previous config saved to /var/cache/conftool/dbconfig/20220929-072745-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35138 and previous config saved to /var/cache/conftool/dbconfig/20220929-072737-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35137 and previous config saved to /var/cache/conftool/dbconfig/20220929-071825-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35136 and previous config saved to /var/cache/conftool/dbconfig/20220929-071240-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35135 and previous config saved to /var/cache/conftool/dbconfig/20220929-071232-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35134 and previous config saved to /var/cache/conftool/dbconfig/20220929-070320-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35133 and previous config saved to /var/cache/conftool/dbconfig/20220929-065736-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35132 and previous config saved to /var/cache/conftool/dbconfig/20220929-065727-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35131 and previous config saved to /var/cache/conftool/dbconfig/20220929-064815-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35130 and previous config saved to /var/cache/conftool/dbconfig/20220929-064231-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35129 and previous config saved to /var/cache/conftool/dbconfig/20220929-064222-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P35128 and previous config saved to /var/cache/conftool/dbconfig/20220929-063508-root.json
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35127 and previous config saved to /var/cache/conftool/dbconfig/20220929-063310-root.json
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35126 and previous config saved to /var/cache/conftool/dbconfig/20220929-062726-root.json
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35125 and previous config saved to /var/cache/conftool/dbconfig/20220929-061805-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35124 and previous config saved to /var/cache/conftool/dbconfig/20220929-061221-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35123 and previous config saved to /var/cache/conftool/dbconfig/20220929-060532-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary and set section read-write [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35122 and previous config saved to /var/cache/conftool/dbconfig/20220929-060425-root.json
* 06:03 marostegui: Starting s7 codfw failover from db2121 to db2118 - [[phab:T318888|T318888]]
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35121 and previous config saved to /var/cache/conftool/dbconfig/20220929-055716-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2118 from API [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35120 and previous config saved to /var/cache/conftool/dbconfig/20220929-054542-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35119 and previous config saved to /var/cache/conftool/dbconfig/20220929-054509-root.json
* 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318888|T318888]]
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318888|T318888]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35118 and previous config saved to /var/cache/conftool/dbconfig/20220929-054211-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 from API [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35117 and previous config saved to /var/cache/conftool/dbconfig/20220929-053951-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35116 and previous config saved to /var/cache/conftool/dbconfig/20220929-053407-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35115 and previous config saved to /var/cache/conftool/dbconfig/20220929-053302-root.json
* 05:32 marostegui: Starting s4 codfw failover from db2110 to db2140 - [[phab:T318886|T318886]]
* 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35114 and previous config saved to /var/cache/conftool/dbconfig/20220929-052805-ladsgroup.json
* 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 05:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35113 and previous config saved to /var/cache/conftool/dbconfig/20220929-052743-ladsgroup.json
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35112 and previous config saved to /var/cache/conftool/dbconfig/20220929-051237-ladsgroup.json
* 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T318886|T318886]]
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35111 and previous config saved to /var/cache/conftool/dbconfig/20220929-051114-root.json
* 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T318886|T318886]]
* 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35110 and previous config saved to /var/cache/conftool/dbconfig/20220929-045730-ladsgroup.json
* 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35109 and previous config saved to /var/cache/conftool/dbconfig/20220929-044224-ladsgroup.json
* 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35108 and previous config saved to /var/cache/conftool/dbconfig/20220929-035724-ladsgroup.json
* 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35107 and previous config saved to /var/cache/conftool/dbconfig/20220929-035647-ladsgroup.json
* 03:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35106 and previous config saved to /var/cache/conftool/dbconfig/20220929-034140-ladsgroup.json
* 03:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 03:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35105 and previous config saved to /var/cache/conftool/dbconfig/20220929-032634-ladsgroup.json
* 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35104 and previous config saved to /var/cache/conftool/dbconfig/20220929-031127-ladsgroup.json
* 02:29 ejegg: updated fundraising CiviCRM from {{Gerrit|f3461a44}} to {{Gerrit|5e1738a1}}
* 02:20 ejegg: updated fundraising python tools from {{Gerrit|dd494413}} to {{Gerrit|14d60435}}
* 01:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS buster
* 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
* 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage


== 2019-07-25 ==
== 2022-09-28 ==
* 23:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/extension.json: Fix over-eager GrowthExperiments popups ([[phab:T229045|T229045]]) (duration: 00m 50s)
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
* 23:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[:gerrit:523214{{!}}Revert "Delete Image-reviewer group from commonswiki for good"]] ([[phab:T228098|T228098]]) (duration: 00m 47s)
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
* 23:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
* 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
* 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:20 ejegg: updated fundraising CiviCRM from {{Gerrit|d31c19a0}} to {{Gerrit|f3461a44}}
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
* 21:21 ladsgroup@cumin1001


== 2019-07-24 ==
== 2022-09-27 ==
* 23:46 nuria@deploy1001: Finished deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2 (duration: 13m 34s)
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 23:43 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow: Fix JS error when saving Flow board descriptions ([[phab:T228818|T228818]]) (duration: 01m 01s)
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 23:42 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Fix JS error when saving Flow board descriptions ([[phab:T228818|T228818]]) (duration: 01m 03s)
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:39 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable homepage for 50% of new users on arwiki ([[phab:T228120|T228120]]) (duration: 00m 58s)
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 23:32 nuria@deploy1001: Started deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:30 nuria@deploy1001: Finished deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues) (duration: 18m 10s)
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 23:22 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage on arwiki ([[phab:T228120|T228120]]) (duration: 00m 55s)
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Correct typo in arwiki help panel config ([[phab:T228820|T228820]]) (duration: 00m 57s)
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 23:12 nuria@deploy1001: Started deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues)
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 22:41 thcipriani@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 22:36 thcipriani@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 22:28 thcipriani@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 21:22 mutante: <+icinga-wm> RECOVERY - Device not healthy -SMART- on restbase-dev1006 is OK: All metrics within thresholds. ([[phab:T224260|T224260]])
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:18 cscott@deploy1001: Finished deploy [parsoid/deploy@abd05ab]: Updating Parsoid to {{Gerrit|df1af404}} ([[phab:T227216|T227216]], [[phab:T226523|T226523]], [[phab:T226451|T226451]]) (duration: 18m 35s)
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 21:16 nuria@deploy1001: Finished deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95 (duration: 03m 54s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:12 nuria@deploy1001: Started deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:03 ppchelko@deploy1001: Finished deploy [restbase/deploy@7911f65]: Store PCS endpoints [[phab:T222384|T222384]] (duration: 18m 18s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:00 cscott@deploy1001: Started deploy [parsoid/deploy@abd05ab]: Updating Parsoid to {{Gerrit|df1af404}} ([[phab:T227216|T227216]], [[phab:T226523|T226523]], [[phab:T226451|T226451]])
* 21:12 TheresNoTime: closing UTC late backport window
* 20:45 ppchelko@deploy1001: Started deploy [restbase/deploy@7911f65]: Store PCS endpoints [[phab:T222384|T222384]]
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 20:39 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to {{Gerrit|1751a2e}} (duration: 04m 20s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints [[phab:T222384|T222384]] (duration: 01m 34s)
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints [[phab:T222384|T222384]]
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to {{Gerrit|1751a2e}}
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:12 jeh: redirecting dumps.wikimedia.org back to labstore1007.wikimedia.org [[phab:T224228|T224228]]
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 19:43 ejegg: updated fundraising CiviCRM from {{Gerrit|875ab97742}} to {{Gerrit|121feb5d53}}
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 19:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer on group0 wikis - [[phab:T200751|T200751]] (duration: 00m 55s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:33 cmjohnson1: moving cloudvirt107 to 10G rack [[phab:T228691|T228691]]
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 18:19 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/localisation/LocalisationCache.php: {{Gerrit|31d99eb381bc}} (duration: 00m 54s)
* 20:59 TheresNoTime: extending UTC late backport window
* 18:15 ejegg: updated payments-wiki from {{Gerrit|a28ad541ed}} to {{Gerrit|70b432d309}}
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:13 urandom: creating new restbase keyspaces -- [[phab:T228804|T228804]]
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:12 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34.0-wmf.15
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:14 XioNoX: rollback failover master VIP of ae2.1202 inet6 away from cr1-eqiad - [[phab:T226782|T226782]]
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 17:10 XioNoX: Add mr1-codfw<->cr1/2-codfw vlan/link config on asw-a-codfw - [[phab:T228112|T228112]]
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 16:44 jijiki: Rolling puppet-enable and apache reload of jobrunners in codfw
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:12 bblack: re-pooling recdns on dns1001 via confctl - [[phab:T226782|T226782]]
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:11 bblack: lvs1014 - restore puppet and resolv.conf contents, restart pybal
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 16:10 bblack: dns1001 - restart recursor and re-enable puppet - [[phab:T226782|T226782]]
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:07 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:59 bblack: dns1001 - puppet disable, stop recursor service to kill anycast advert - [[phab:T226782|T226782]]
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:59 bblack: lvs1014 - puppet disable, remove dns1001 from resolv.conf, restart pybal - [[phab:T226782|T226782]]
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 15:58 XioNoX: failover master VIP of ae2.1202 inet6 away from cr1-eqiad - [[phab:T226782|T226782]]
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 15:56 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:56 bblack: depooling recdns on dns1001 via confctl - [[phab:T226782|T226782]]
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:56 bblack: depooling recdns on dns1001 via confctl
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 15:47 jijiki: Rolling puppet-enable and apache reload of jobrunners in eqiad
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:44 jeh: rebooting labstore1007.wikimedia.org for updates [[phab:T224228|T224228]]
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:42 jijiki: Disable puppet on jobrunners for 525306
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 15:11 herron: resume ingesting [message] =~ /^SlowTimer/ logs on logstash1007 (as a canary)
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 15:02 XioNoX: re-enable vc link between asw2-a6 and asw2-a7 - [[phab:T228823|T228823]]
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:58 jeh: unmounting dumps NFS clients from labstore1007.wikimedia.org [[phab:T224228|T224228]]
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 XioNoX: cleared vc ports stats on asw2-a-eqiad - [[phab:T228823|T228823]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:43 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - [[phab:T226851|T226851]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:40 marostegui: Drop abuse_filter_log.afl_log_id in s5 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:31 tarrow@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:49 robh: rebooting cloudvirt1015 into OS, memory error confirmed.  new memory replacement dispatch entered via [[phab:T220853|T220853]]
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 13:31 marostegui: Drop abuse_filter_log.afl_log_id in s2 eqiad - [[phab:T226851|T226851]]
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:25 robh: rebooting cloudvirt1015 into memtest for dell support repair via [[phab:T220853|T220853]]
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:06 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.15 (duration: 00m 54s)
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.15
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:19 marostegui: Stop haproxy on dbproxy1004 and dbproxy1009 (m4 - eventlogging) - [[phab:T228768|T228768]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:23 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: [[gerrit:525254{{!}}Disable FileImporter source wiki edits (T228851)]] (duration: 00m 54s)
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 11:12 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:514672{{!}}Remove Content Translation event logging config]] (part 2/2) (duration: 00m 54s)
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 11:10 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: [[gerrit:514672{{!}}Remove Content Translation event logging config]] (part 1/2) (duration: 00m 59s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:04 marostegui: Drop abuse_filter_log.afl_log_id from labswiki (wikitech) and labtestwiki - [[phab:T226851|T226851]]
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1082 (duration: 00m 55s)
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 into API after upgrade (duration: 00m 55s)
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 after upgrade (duration: 00m 54s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:40 marostegui: Stop MySQL on db1082 for upgrade
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for upgrade (duration: 00m 57s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:35 marostegui: Drop abuse_filter_log.afl_log_id in s2 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:58 marostegui: Drop abuse_filter_log.afl_log_id  from wikidata in eqiad - [[phab:T226851|T226851]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 07:21 marostegui: Stop MySQL on db1117:3322 to check dbproxy1013 notifications - [[phab:T202367|T202367]]
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:10 marostegui: Deploy grants for dbproxy1013 in m2 - [[phab:T202367|T202367]]
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:00 marostegui: Stop puppet on dbprov2001 to generate s5 mysqldump manually
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:52 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:51 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 04:50 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 53s)
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 04:49 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 04:45 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 04:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 04:40 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: (no justification provided) (duration: 00m 56s)
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 03:41 tstarling@deploy1001: Synchronized w/fatal-error.php: Adding post-send exception test for [[phab:T228462|T228462]] (duration: 00m 54s)
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 03:39 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adding DeferredUpdates log channel ([[phab:T228462|T228462]]) (duration: 00m 56s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:01 eileen: payments-wiki revision changed from {{Gerrit|224c6b2d7b}} to {{Gerrit|a28ad541ed}}, config revision is {{Gerrit|8dcb77cf22}}
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2019-07-23 ==
== 2022-09-26 ==
* 23:44 eileen: civicrm revision changed from {{Gerrit|88e9f24893}} to {{Gerrit|875ab97742}}, config revision is {{Gerrit|4006d3bdc5}}
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 23:43 shdubsh: reverting logstash mitigations and re-enable puppet
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 23:42 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/diff/DifferenceEngine.php: [[phab:T228766|T228766]] Don't double wrap rollback links (duration: 00m 56s)
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 23:31 mutante: mw1267 - rm -rf /srv/mediawiki/php-1.33.0-wmf.23 ; rm -rf /srv/mediawiki/php-1.32.0-wmf.3 ; scap pull
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 22:36 mutante: rolling out scap 3.11.1-1 on mw-eqiad servers
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 22:14 mutante: continuing rollout of new scap version 3.11.1-1, starting with kafka-all followed by other cumin-alias groups ([[phab:T228328|T228328]])
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 22:06 herron: puppet temporarily disabled on eqiad/codfw logstash collectors while catching up with backlog. see /etc/logstash/conf.d/01-filter_temp_drops.conf
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 21:52 herron: logstash - temporarily dropping logs matching [message] =~ /^SlowTimer/ due to UTF-8 parsing errors that are stopping the logstash processing pipeline. will re-enable after logstash has caught up with the backlog
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 20:59 shdubsh: temporarily disable input-kafka-rsyslog-shipper and drop memcached logs on logstash nodes
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 20:08 paravoid: asw2-a-eqiad: request virtual-chassis vc-port set interface member 6 vcp-255/1/0 disable
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 19:58 eileen: process-control config revision is {{Gerrit|4006d3bdc5}} - disabled  drush fill donor totals job
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 19:49 mutante: mwdebug1002 - restarting hhvm - mw1312 - restarted apache
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 19:44 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 19:40 mutante: restarting hhvm on mw1312
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:28 cdanis: depool all appservers in eqiad A7 cdanis@cumin1001.eqiad.wmnet ~ 🍵 sudo cumin 'mw12[67-83]*' 'depool'
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:11 bblack: repool lvs1013 - [[phab:T227143|T227143]]
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:10 bblack: repool cp1077 + cp1078 - [[phab:T227143|T227143]]
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 elukey: depool mw1261 for investigation
* 20:31 TheresNoTime: closing UTC late backport window
* 19:06 herron: restarting logstash on logstash100[789]
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 18:53 robh: mw1271 had power loss event due to pdu swap via [[phab:T227143|T227143]]
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:45 mutante: rolling out scap 3.11.1-1 on all mw codfw servers ([[phab:T228328|T228328]])
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:43 mutante: rolling out scap 3.11.1-1 on mw canary servers ([[phab:T228328|T228328]])
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:13 robh: started depooling servers in a7-eqiad for pdu work via [[phab:T227143|T227143]]
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:11 cdanis: depool mw1267
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 18:10 cdanis: cdanis@mw1267.eqiad.wmnet /srv/mediawiki ☕ scap pull
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 18:09 cdanis: cdanis@mw1267.eqiad.wmnet ~ ☕ sudo apt install python-concurrent.futures
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 18:08 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] Make XmlDumpwriter resilient to blob store corruption (duration: 00m 54s)
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 18:07 James_F: Belay that, error on mw1267.
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:06 James_F: Sync error on mw1314.eqiad.wmnet, No module named concurrent.futures
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] Make XmlDumpwriter resilient to blob store corruption (duration: 00m 57s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:05 bblack: lvs1013 - disable puppet and stop pybal - [[phab:T227143|T227143]]
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:04 bblack: depool cp1077 + cp1088 - [[phab:T227143|T227143]]
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 18:03 cdanis@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|8def4af1d}} noc db.php: include readonly status & group loads (duration: 00m 55s)
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 17:52 moritzm: installing Java security updates on kafka/main and Logstash servers
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 17:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema [[phab:T226522|T226522]], step 2 (duration: 01m 37s)
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 17:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema [[phab:T226522|T226522]], step 2
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 17:00 ppchelko@deploy1001: Finished deploy [changeprop/deploy@894f735]: Switch internal events to the new schema [[phab:T226522|T226522]] (duration: 01m 30s)
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 16:58 ppchelko@deploy1001: Started deploy [changeprop/deploy@894f735]: Switch internal events to the new schema [[phab:T226522|T226522]]
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 16:22 godog: pool prometheus1003 - [[phab:T227139|T227139]]
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 15:46 robh: side b of a5-eqiad swapping pdu via [[phab:T227141|T227141]]
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 15:14 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 15:08 _joe_: uninstalling php-pear, php-mail, php-mail-mime from mw1267 [[phab:T195364|T195364]]
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]], attempt 2 (duration: 13m 08s)
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 14:39 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]], attempt 2
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 14:14 robh: a3-eqiad pdu swap taking place now via [[phab:T227139|T227139]]
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 13:47 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 13:45 godog: depool restbase1016 restbase1019 restbase1011 restbase1010 prometheus1003 ahead of PDU work - [[phab:T227139|T227139]]
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 13:45 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 13:44 moritzm: installing Java security updates on furud/flerovium
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 13:43 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 13:27 jeh: dumps switching active vps to labstore1006 [[phab:T224228|T224228]]
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 13:17 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 13:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 13:06 marostegui: Drop abuse_filter_log.afl_log_id from s8 codfw (lag will happen on codfw s8) - [[phab:T226851|T226851]]
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 12:33 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (duration: 29m 46s)
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 12:04 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 12:02 akosiaris: drain kubernetes1001. [[phab:T227139|T227139]]
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 12:01 akosiaris: empty ganeti1007 from running instances. [[phab:T227139|T227139]]
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 11:59 akosiaris: enable disable poolcounter1003, switchover codfw poolcounters [[phab:T224572|T224572]]
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 11:58 tarrow: EU SWAT finished
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 11:58 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s)
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 11:56 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:525065{{!}}T214902 Fix missing /termbox in SSRTermboxServerUrl]] (duration: 00m 44s)
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 11:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.10 (duration: 07m 55s)
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 11:43 jijiki: restart php-fpm on mwdebug*
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 11:25 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:525062{{!}}T214902 Enable termbox on testwikidatawiki]] (duration: 01m 37s)
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 11:08 jijiki: enable puppet on jobrunners
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 10:17 marostegui: Drop abuse_filter_log.afl_log_id from db1096:3316, db1139:3316 and dbstore1005:3316 [[phab:T226851|T226851]]
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 10:02 moritzm: installing Java security updates on notebook/stat hosts
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 09:53 marostegui: Drop abuse_filter_log.afl_log_id from s6 codfw with replication (this will cause lag in s6 codfw) - [[phab:T226851|T226851]]
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 09:51 akosiaris: enable poolcounter1005, disablepoolcounter1001 [[phab:T224572|T224572]]
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 09:51 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool into API db1100 after upgrade (duration: 00m 46s)
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 09:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool into API db1100 after upgrade (duration: 00m 47s)
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 09:09 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 09:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1100 after upgrade (duration: 00m 46s)
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 08:34 marostegui: Upgrade db1100
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 for upgrade (duration: 00m 53s)
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:08 marostegui: Stop MySQL on db2044 to test dbproxy2002 notifications - [[phab:T202367|T202367]]
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 07:31 marostegui: Deploy grants for dbproxy2002 on m2 - [[phab:T202367|T202367]]
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 04:52 eileen: civicrm revision changed from {{Gerrit|d951b07ce3}} to {{Gerrit|88e9f24893}}, config revision is {{Gerrit|f7b7622e27}}
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 04:43 marostegui: Failover m1 from dbproxy1001 to dbproxy1006 [[phab:T227139|T227139]]
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 00:06 Urbanecm: slwiki updateCollection.php completed ([[phab:T208984|T208984]])
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2019-07-22 ==
== 2022-09-25 ==
* 23:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 524952 Increase hewiki rollback limit for patrollers to 50/60 (duration: 00m 48s)
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 23:54 Urbanecm: Run mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 23:42 Urbanecm: All updateCollation.php runs completed, except the one for slwiki ([[phab:T208984|T208984]])
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 23:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add flood group to ptwiki ([[phab:T228521|T228521]]) (duration: 00m 47s)
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwiktionary --previous-collation=uppercase ([[phab:T208984|T208984]])
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiversity --previous-collation=uppercase ([[phab:T208984|T208984]])
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 23:37 Urbanecm: Run mwscript updateCollation.php --wiki=slwikisource --previous-collation=uppercase ([[phab:T208984|T208984]])
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 23:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix comment in IS.php (noop, [[phab:T227000|T227000]]) (duration: 00m 46s)
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiquote --previous-collation=uppercase ([[phab:T208984|T208984]])
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikibooks --previous-collation=uppercase ([[phab:T208984|T208984]])
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 23:33 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: [[:gerrit:524704{{!}}Fix "Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource"]] ([[phab:T227000|T227000]]) (duration: 00m 47s)
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 23:29 Urbanecm: Run mwscript updateCollation.php --wiki=slwiki --previous-collation=uppercase ([[phab:T208984|T208984]])
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgCategoryCollation to uca-sl-u-kn on Slovene projects (sl) ([[phab:T208984|T208984]]) (duration: 00m 47s)
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 22:11 mutante: dropped zero.wikiMedia.org from DNS ([[phab:T187716|T187716]])
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 21:50 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for [[phab:T227416|T227416]] (duration: 00m 46s)
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 21:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate [[phab:T211248|T211248]] (duration: 13m 01s)
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 21:35 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Temporary make account creation limits more restrictive" (duration: 00m 47s)
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 21:27 eileen: civicrm revision is {{Gerrit|d951b07ce3}}, config revision is {{Gerrit|f7b7622e27}}
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 21:25 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate [[phab:T211248|T211248]]
* 21:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]] (duration: 16m 14s)
* 21:21 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:20 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:19 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:17 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:05 eileen: civicrm revision changed from {{Gerrit|f932e56cd2}} to {{Gerrit|d951b07ce3}}, config revision is {{Gerrit|f7b7622e27}}
* 21:04 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]]
* 20:04 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0be6045]: Weekly deploy (duration: 18m 42s)
* 19:46 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0be6045]: Weekly deploy
* 19:09 ppchelko@deploy1001: Finished deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate [[phab:T211248|T211248]] (duration: 01m 31s)
* 19:07 ppchelko@deploy1001: Started deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate [[phab:T211248|T211248]]
* 18:59 elukey: repool scb1001 after pdu maintenance
* 18:59 herron: repooling kafka1001 [[phab:T227140|T227140]]
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel for 50% of new users on arwiki ([[phab:T226729|T226729]]) (duration: 00m 47s)
* 18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Trying the last sync again, because it's appearing inconsistently (duration: 00m 47s)
* 18:15 thcipriani: restarting gerrit due to [[phab:T224448|T224448]]
* 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments help panel on arwiki ([[phab:T226729|T226729]]) (duration: 00m 48s)
* 18:00 elukey: arm keyholder on netmon1002 after power loss
* 17:35 elukey: depool scb1001 for PDU work [[phab:T227140|T227140]]
* 17:22 herron: depooling kafka1001 for PDU work [[phab:T227140|T227140]]
* 17:17 nuria@deploy1001: Finished deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs (duration: 14m 51s)
* 17:02 nuria@deploy1001: Started deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs
* 17:02 jijiki: enable puppet on all jobrunners
* 16:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T87899|T87899]] Use wfLoadExtension for Collection rather than deprecated entry point (duration: 00m 47s)
* 16:48 jforrester@deploy1001: Synchronized wmf-config/extension-list: Load Collection i18n via extension.json directly (duration: 00m 47s)
* 16:36 jeh: redirecting dumps.wikimedia.org  dns to labstore1006 [[phab:T224228|T224228]]
* 15:49 jijiki: Rolling depool and pool of mw1293, mw1294, mw1295, mw1296, mw1299 - [[phab:T219148|T219148]]
* 15:38 marostegui: Stop mysql and power off pc2010 for on-site maintenance - [[phab:T227552|T227552]]
* 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Wikibase/lib/WikibaseLib.php: [[phab:T227814|T227814]] Wikibase: Define $wgMessagesDirs in WikibaseLib PHP entry point (duration: 00m 48s)
* 15:27 jijiki: Depool mw1300 and pool back
* 15:24 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228614|T228614]] XmlDumpWriter: don't load revision text content unless requested to (duration: 00m 48s)
* 15:17 jijiki: Disable puppet on jobrunners to enable php7_only
* 14:55 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 14:53 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 14:44 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:38 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:30 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:30 ottomata: deploying refactored eventgate chart using eventgate-wikimedia image to  eventgate-* services -  [[phab:T226668|T226668]]
* 14:28 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
* 13:12 kart_: Updated cxserver to 2019-07-17-074415-production ([[phab:T227553|T227553]], [[phab:T216812|T216812]])
* 13:07 kartik@deploy1001: scap-helm cxserver finished
* 13:07 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
* 13:07 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 13:02 kartik@deploy1001: scap-helm cxserver finished
* 13:02 kartik@deploy1001: scap-helm cxserver cluster codfw completed
* 13:02 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 13:00 kartik@deploy1001: scap-helm cxserver finished
* 13:00 kartik@deploy1001: scap-helm cxserver cluster staging completed
* 12:59 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 12:58 marostegui: Stop MySQL on db1117:3321 to test dbproxy1014 (replacement for dbproxy1006) on m1 - [[phab:T202367|T202367]]
* 12:22 moritzm: installing debian-archive-keyring Stretch update (SUA 164)
* 11:20 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:524685{{!}}Enable wgNamespacesWithSubpages on main NS for kowikiversity (T228481)]] (duration: 00m 54s)
* 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:523661{{!}}Enable FileImporter source wiki edit and delete, (remove labs customizations) (T225617, T226532)]] (duration: 00m 54s)
* 11:13 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:523661{{!}}Enable FileImporter source wiki edit and delete (T225617, T226532)]] (duration: 00m 56s)
* 10:55 jijiki: Enable puppet on jobrunners
* 10:27 jijiki: Depool and pool mw1300
* 10:23 jijiki: Disable puppet on jobrunners for 524336 - [[phab:T219148|T219148]]
* 10:21 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 10:20 fsero: deploy coredns in staging [[phab:T226516|T226516]]
* 09:47 elukey: failover + restart of Hadoop HDFS namenode on an-master1001 to apply GC settings - [[phab:T228620|T228620]]
* 09:40 marostegui: Deploy grants on m1 to allow connections from dbproxy1014 - [[phab:T202367|T202367]]
* 09:32 elukey: restart hadoop hdfs namenode on an-master1002 to apply new GC settings - [[phab:T228620|T228620]]
* 08:33 marostegui: Rename table enwiki.math on db2116  [[phab:T196055|T196055]]
* 07:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1134 after schema change [[phab:T226851|T226851]] (duration: 00m 51s)
* 07:54 elukey: sudo -i depool on elastic1046 - broken disk (srv partition not available) - [[phab:T228606|T228606]]
* 07:40 elukey: systemctl reset-failed restbase on restbase1007->15 (decommed nodes)
* 07:27 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db1134 [[phab:T226851|T226851]]
* 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1134 for schema change [[phab:T226851|T226851]] (duration: 00m 56s)
* 07:17 moritzm: installing openjdk-11 security updates
* 06:47 marostegui: Stop MySQL on db2062 to test dbproxy2001 notification [[phab:T202367|T202367]]
* 06:23 elukey: restart hadoop-hdfs-namenode on an-master1002 to verify if out-of-the-ordinary GC activity
* 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1104 from s8 API (duration: 00m 55s)
* 05:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 into API (duration: 00m 58s)
* 05:24 marostegui: Compress more tables on labsdb1009 - [[phab:T222978|T222978]]
* 04:48 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/extension.json: fixing UBN [[phab:T228465|T228465]] (duration: 00m 54s)
* 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/maintenance/loadExitNodes.php: fixing UBN [[phab:T228465|T228465]] (duration: 00m 54s)
* 04:44 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/includes/TorExitNodes.php: fixing UBN [[phab:T228465|T228465]] (duration: 00m 56s)
* 04:17 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: fix UBN bug [[phab:T227772|T227772]] (duration: 00m 56s)


== 2019-07-21 ==
== 2022-09-23 ==
* 01:06 Urbanecm: Deployed patch for [[phab:T228574|T228574]]
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance


== 2019-07-19 ==
== 2022-09-22 ==
* 22:36 mutante: phab2001 - switching apache to php-fpm and worker instead of mpm-prefork (to match phab1001) ([[phab:T190568|T190568]] [[phab:T137928|T137928]] [[phab:T190572|T190572]])
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 21:57 eileen: update process control process-control config revision is {{Gerrit|c913a5f261}}
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 21:34 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:25 eileen: civicrm revision changed from {{Gerrit|21d3c5a3fc}} to {{Gerrit|f932e56cd2}}, config revision is {{Gerrit|9f7eba2193}}
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:35 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:35 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:34 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:23 dancy@deploy1002: backport aborted: (duration: 00m 05s)
* 19:07 eevans@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 eevans@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:53 cdanis@deploy1001: Synchronized docroot/noc/db.php: noc: db.php: support ?dc=codfw, and cleanups (duration: 00m 56s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:44 XioNoX: change netflow target port to 2055 in eqiad
* 20:55 brennen: end of utc late backport & config window
* 16:17 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:55 moritzm: rebooting mw2164 for a test
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 15:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 15:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 15:40 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:27 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 15:26 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 15:22 fsero: deploy coredns in staging [[phab:T226516|T226516]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:03 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Collection/Collection.php: {{Gerrit|90eed0fad}} / [[phab:T87899|T87899]] (duration: 00m 54s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:35 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/Collection.php: {{Gerrit|66ce154d7d734209c76a62cf}} / [[phab:T87899|T87899]] (duration: 00m 56s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:29 ariel@deploy1001: Finished deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps (duration: 00m 03s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:29 ariel@deploy1001: Started deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:28 Krinkle: krinkle@deploy1001: Untracked file found in php-1.34-wmf.13
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:28 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34-wmf.13 and php-1.34-wmf.14
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:30 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 13:04 moritzm: installing bzip2 security updates on jessie
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 12:28 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 10:56 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:55 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:53 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:53 fsero: deploying calico from helmfile in staging [[phab:T227775|T227775]]
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:35 jijiki: enable puppet on jobrunners
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 10:26 jijiki: disable puppet on jobrunners for 523908
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 08:37 ariel@deploy1001: Finished deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default (duration: 00m 04s)
* 18:38 dancy@deploy1002: Started scap: testing
* 08:37 ariel@deploy1001: Started deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 08:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 08:24 gehel: repooling wdqs2004 - [[phab:T228122|T228122]]
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 08:22 gehel: repooling wdqs2003 - [[phab:T228122|T228122]]
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 08:20 vgutierrez: restart pybal on lvs2003
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 08:16 vgutierrez: restart pybal on lvs2006
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1109 into API (duration: 00m 54s)
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 07:57 moritzm: installing idp1001 [[phab:T228403|T228403]]
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 07:38 moritzm: rebooting tungsten for kernel update
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:03 elukey: restart php-fpm on mw1330 - op-cache hit ratio low
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:02 jynus: reloading dbproxy1004/9
* 16:39 dancy@deploy1002: Sync cancelled.
* 07:01 elukey: depool wdqs2004 from all services (waiting for maintenance)
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 06:32 legoktm@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: [[phab:T225199|T225199]] (duration: 00m 55s)
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 06:30 legoktm@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: [[phab:T225199|T225199]] (duration: 00m 55s)
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:15 elukey: clear opcache on mwdebug*
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:26 fsero: repool ms-fe2005 - [[phab:T228196|T228196]]
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2116 (duration: 00m 55s)
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:11 eileen: I think I didn't push the turn it on commit - tried again  process-control config revision is {{Gerrit|9f7eba2193}}
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:03 eileen: process-control config revision is {{Gerrit|7598dc1bf9}} (jobs reenabled)
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:52 XioNoX: enable outbound sampling on eqiad's router
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:52 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add even more severe rate limits for eswikiquote and some other, smaller wikis ([[phab:T227416|T227416]]) (duration: 00m 58s)
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:38 mutante: mwmaint2001 - puppet fails - not removing a bunch of log dirs for maintenance crons
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 00:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:08 eileen: process-control config revision is {{Gerrit|7598dc1bf9}} - jobs disabled
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:04 mutante: install1002 - exported indices for new scap version - copied back from buster to stretch - upgraded scap version on mw2250 - scap pull now works and starts to rsync ([[phab:T228482|T228482]], [[phab:T228328|T228328]], [[phab:T226948|T226948]])
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2019-07-18 ==
== 2022-09-21 ==
* 23:50 mutante: built new scap version 3.11.1-1 on boron, copied to install1002, imported package with reprepro, copied from stretch to jessie and buster ([[phab:T228482|T228482]])
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:22 Lucas_WMDE: Evening SWAT done
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:17 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:523141{{!}}Configure Citoid+Wikibase integration on Beta (production no-op) (T228411)]] (duration: 00m 54s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:523140{{!}}Set $wgWBRepoSettings[enableRefTabs] in Wikibase.php (T228414)]] (duration: 01m 16s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:523139{{!}}Define settings for Citoid+Wikibase integration (T228414)]] (duration: 00m 55s)
* 20:46 tgr_: UTC late deploys done
* 22:23 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=wdqs1008.eqiad.wmnet
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:16 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 22:00 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:49 bd808: Cleaned up stale striker logs on labweb1001 and labweb1002. Logs go to journald now so log rotate is not triggered to rotate out logs from before that change.
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:42 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:36 bd808@deploy1001: Finished deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models ([[phab:T228222|T228222]], [[phab:T228332|T228332]]) (duration: 01m 13s)
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 21:34 bd808@deploy1001: Started deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models ([[phab:T228222|T228222]], [[phab:T228332|T228332]])
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 mutante: gerrit (cobalt) - scheduled 1h downtime, rebooting for kernel upgrade
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:03 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: [[phab:T228290|T228290]] Fix fatal in ChangesListFormatter::getLogTextLinks() (duration: 01m 02s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:57 mutante: gerrit2001 - icinga downtime for 1h
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:56 mutante: gerrit2001 - reboot for kernel upgrade
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 20:51 mutante: gerrit2001 - apt-get upgrade; apt-get autoremove ; puppet agent -tv
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:55 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T228374|T228374]] Enable SecureLinkFixer in beta cluster (2/2) (duration: 00m 55s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T228374|T228374]] Enable SecureLinkFixer in beta cluster (1/2) (duration: 00m 55s)
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T207750|T207750]] Revoke editmyuserjsredirect from all users (duration: 00m 54s)
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 19:25 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:21 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 19:20 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:45 mutante: contint2001 - had puppet failure in puppet board / dpkg issue due to unfinished zuul install which was done on contint1001 - stopped zuul and zuul-merger, apt-install zuul (was already latest version but needed to finish configure step), apt-get autoremove to remove unused packages, ran puppet. dpkg and puppet happy again
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/objectcache/RedisBagOStuff.php: {{Gerrit|69cd8b0f49e8caf8c7398ad76a1ce3d2da4f3e6b}} (duration: 00m 55s)
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 17:15 Krinkle: krinkle@depoy1001: Pull down https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/523844/ and  https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/524276/ (no-op, not deploying)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:29 XioNoX: upgrade Routinator to 0.5.0 in eqiad - [[phab:T220669|T220669]]
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/resources/src/mediawiki.misc-authed-ooui/special.movePage.js: {{Gerrit|e97a284dbe54}} (duration: 00m 58s)
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 16:17 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:06 XioNoX: upgrade Routinator to 0.5.0 in codfw - [[phab:T220669|T220669]]
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 16:05 XioNoX: add routinator 0.5.0 to APT
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 15:54 fsero: depool ms-fe2005 - [[phab:T228196|T228196]]
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 15:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.34.0-wmf.13 # [[phab:T228436|T228436]] [[phab:T220739|T220739]]
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:19 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:46 godog: roll-restart thumbor in codfw - [[phab:T228086|T228086]]
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:45 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:37 liw: all wikis at 1.34.0-wmf.14
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 14:36 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:28 bblack: cp hosts: apt autoremove to clean up pkgs on the fleet
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:27 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery (duration: 00m 20s)
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:26 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:24 godog: repool thumbor2003
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 14:20 godog: reboot thumbor2003
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 14:17 jijiki: Depool thumbor2003 for reboot
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 14:12 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 13:53 moritzm: installing php5 security updates
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 13:50 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 13:36 jeh: rebooting labstore1005.eqiad.wmnet - [[phab:T224228|T224228]]
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:34 jbond42: remove mtail 3.0.0~rc24.1-1+wmf1 from stretch-wikimedia
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 53s)
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:24 jbond42: downgrade cp servers backl to 3.0.0~rc5-1~bpo9+1
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 13:23 liw: promoting 1.34.0-wmf.14 to group1
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:22 godog: temporarily stop ircecho on icinga1001 to avoid spam
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 13:00 jbond42: rolling upgrade of mtail
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:53 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 12:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:34 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:26 jbond42: add mtail 3.0.0~rc24.1-1+wmf1 to stretch-wikimedia
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:13 dcausse: EU Swat done
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert [cirrus] switch search traffic (except completion) to codfw (duration: 00m 56s)
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:02 godog: swift eqiad-prod: put back ms-be1043 sdk1 - [[phab:T218544|T218544]]
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 10:51 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:43 ema: cp-eqiad: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:37 jijiki: enable puppet on services_proxy hosts - [[phab:T228063|T228063]]
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:29 godog: reboot wezen.codfw.wmnet - [[phab:T225713|T225713]]
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:27 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 10:15 jijiki: Disable puppet on services_proxy hosts - [[phab:T228063|T228063]]
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:09 godog: resume swift ms-be rolling restarts - [[phab:T225713|T225713]]
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:03 fsero: reuplodi