You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0f08e8bbe1f74220a7a01a7606b67e0f75734a53: Update the Persian Wikipedia logos (T261033; 2/2) (duration: 00m 56s))
imported>Stashbot
(catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Create alias for Appendix and Appendix_talk namespaces on mywiktionary (T291146) (duration: 00m 55s))
 
(213 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-03-02 ==
== 2021-10-25 ==
* 00:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f08e8bbe1f74220a7a01a7606b67e0f75734a53}}: Update the Persian Wikipedia logos ([[phab:T261033|T261033]]; 2/2) (duration: 00m 56s)
* 23:12 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Create alias for Appendix and Appendix_talk namespaces on mywiktionary ([[phab:T291146|T291146]]) (duration: 00m 55s)
* 00:58 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|0f08e8bbe1f74220a7a01a7606b67e0f75734a53}}: Update the Persian Wikipedia logos ([[phab:T261033|T261033]]; 1/2) (duration: 00m 56s)
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|97ebf7539f7f16d4908f80ea4b8eea5c4b997ecb}}: Separate Wikivoyage wordmark and icon ([[phab:T261033|T261033]]; [[phab:T273477|T273477]]) (duration: 00m 56s)
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:53 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|97ebf7539f7f16d4908f80ea4b8eea5c4b997ecb}}: Separate Wikivoyage wordmark and icon ([[phab:T261033|T261033]]; [[phab:T273477|T273477]]) (duration: 00m 56s)
* 22:57 ryankemper: [wcqs] Downtimed `wcqs*` until roughly a week from now (while we setup oauth)
* 00:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|61647cd191b9bc5e2d8672fa4813b57d958f1a68}}: Fixes max-width configuration for new Vector ([[phab:T260091|T260091]]) (duration: 00m 56s)
* 22:53 legoktm: uploaded PHP 7.4.25 to apt.wm.o (DSA-4992-1)
* 00:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6cc8521310d6e952fc7d0b23579021b650828764}}: Enable og tags on non-wikidata wikis ([[phab:T157145|T157145]]) (duration: 00m 56s)
* 22:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
* 00:37 urbanecm@deploy1002: Synchronized wmf-config/config/hrwiki.yaml: REDEPLOY: {{Gerrit|d53834e9460ea6321e50401cda9e53d9f74c545e}}: Enable Growth features on hrwiki in stealth mode (3/3; [[phab:T275684|T275684]]) (duration: 00m 56s)
* 22:30 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 03m 04s)
* 00:36 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: REDEPLOY: {{Gerrit|d53834e9460ea6321e50401cda9e53d9f74c545e}}: Enable Growth features on hrwiki in stealth mode (2/3; [[phab:T275684|T275684]]) (duration: 00m 56s)
* 22:27 ryankemper@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 00:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: {{Gerrit|d53834e9460ea6321e50401cda9e53d9f74c545e}}: Enable Growth features on hrwiki in stealth mode (1/3; [[phab:T275684|T275684]]) (duration: 00m 55s)
* 21:53 mutante: new project language "pwn" added - Paiwan is a native language of Taiwan, spoken by the Paiwan, a Taiwanese indigenous people. [[phab:T292415|T292415]]
* 00:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: Config: [[gerrit:666842{{!}}EventLoggingSchemas: Bump HomepageVisit version (T275615)]] (duration: 00m 56s)
* 21:52 mutante: new project language "ami" added - Sowal no 'Amis is the Formosan language of the 'Amis (or Ami), an indigenous people living along the east coast of Taiwan. - [[phab:T292414|T292414]]
* 00:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: {{Gerrit|21cb6f5b32920c33611f26a0f3c97247f6f496f8}}: Revert "Revert "vector: Stage 2 of WVUI search treatment A/B test"" ([[phab:T249297|T249297]]) (duration: 00m 56s)
* 21:50 mutante: log authdns1001 (DNS) - sudo authdns-update, add new project language "ami" (Amis) for [[phab:T292414|T292414]] - edited langlist.tmpl which regenerates all project zones
* 00:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: {{Gerrit|599b7390c840388d97dc4cdbf1796451d4024c22}}: Simplify deployment of Growth team features (3/3; [[phab:T276091|T276091]]) (duration: 00m 56s)
* 21:40 mutante: authdns1001 (DNS) - sudo authdns-update, add new project language "pwn" (Paiwan) for [[phab:T292415|T292415]]
* 00:27 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: REDEPLOY: {{Gerrit|de0f74126eddafb5375b853d543b377e78544caa}}: Simplify deployment of Growth team features (2/3; [[phab:T276091|T276091]]) (duration: 00m 57s)
* 19:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
* 00:26 urbanecm@deploy1002: sync-file aborted: REDEPLOY: {{Gerrit|de0f74126eddafb5375b853d543b377e78544caa}}: Simplify deployment of Growth team features (2/3; [[phab:T276091|T276091]]) (duration: 00m 25s)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
* 00:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: {{Gerrit|e991806eb9dc5ec018ebc59832d02e8a6563ba0a}}: Simplify deployment of Growth team features (1/3; [[phab:T276091|T276091]]) (duration: 00m 56s)
* 19:47 mutante: mw2255 - depooled=inactive (incl "dsh groups"), shut down physically for [[phab:T283582|T283582]] - can be worked on anytime
* 00:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: Revert: vector: Stage 2 of WVUI search treatment A/B test ([[phab:T249297|T249297]]) (duration: 00m 56s)
* 19:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2255.codfw.wmnet
* 00:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
* 00:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: {{Gerrit|1edcbb53b2f18105d132c839cfe12cccb97031b3}}: vector: Stage 2 of WVUI search treatment A/B test ([[phab:T249297|T249297]]) (duration: 00m 56s)
* 19:42 mutante: icinga - ACKing all unhandled CRIT alerts on hosts with "dev" or "test" in their name, regardless of notifications being disabled or not. just so that we get more signal than noise in actual unhandled CRITs in web UI
* 00:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOYING: {{Gerrit|2a8ece1c92d9d1434b2b5433f3a042a279d9756e}}: GrowthExperiments: set GELinkRecommendationsUseEventGate (duration: 00m 57s)
* 19:40 mutante: cumin2002 - sudo systemctl reset-failed to clear Icinga alert about failed but (now) non-existing service database-backups-snapshots.service, assuming it's a case of "only in active DC"
* 00:18 urbanecm@deploy1002: sync-file aborted: {{Gerrit|2a8ece1c92d9d1434b2b5433f3a042a279d9756e}}: GrowthExperiments: set GELinkRecommendationsUseEventGate (duration: 00m 05s)
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
* 00:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOYING: {{Gerrit|92f65972f4277624f74369af08563a8ca6254bda}}: rowiki: Update help panel links ([[phab:T275130|T275130]]) (duration: 00m 59s)
* 19:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
* 00:16 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:07 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily move mw groups to db1123 [[phab:T294295|T294295]]', diff saved to https://phabricator.wikimedia.org/P17597 and previous config saved to /var/cache/conftool/dbconfig/20211025-190717-kormat.json
* 00:11 mutante: deploy2002 - ran 'git etch' in /srv/mediawiki-staging
* 19:06 mutante: db1112 - powercycling
* 19:04 legoktm@cumin1001: dbctl commit (dc=all): 'Depool db1112 ([[phab:T294295|T294295]])', diff saved to https://phabricator.wikimedia.org/P17596 and previous config saved to /var/cache/conftool/dbconfig/20211025-190436-legoktm.json
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:40 jforrester@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/timeline/includes/Timeline.php: Backport: [[gerrit:734312{{!}}Input may be null when rendering a self-closing tag `<timeline />` (T294020)]] (duration: 00m 55s)
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732971{{!}}Fix some easy codestyle issues]] (duration: 00m 55s)
* 18:22 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:732971{{!}}Fix some easy codestyle issues]] (duration: 00m 54s)
* 18:19 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732840{{!}}Fix array declaration of NS_USER_TALK abbreviation on ruwikiquote (T197058)]] (duration: 00m 55s)
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:15 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:732836{{!}}flaggedrevs: Drop legacy wgFlaggedRevsStatsAge config, no longer read]] (duration: 00m 55s)
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732254{{!}}Make reply tool available as opt-out on frwiki (T293687)]] (duration: 00m 56s)
* 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
* 17:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 17:39 mutante: mw2253 - scap pull after hw maintenance is over
* 17:32 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:24 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:23 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:22 XioNoX: update core routers ACLs
* 17:20 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 16:49 XioNoX: update management routers ACLs
* 16:36 XioNoX: DNS - Add eqsin-ulsfo transport v6 prefix - [[phab:T273308|T273308]]
* 16:31 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:28 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 16:25 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 16:25 mmandere@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:21 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2253.codfw.wmnet
* 16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:734298{{!}}Empty wikibase disabled access entity types on Beta (T294159)]] (beta-only) (duration: 01m 47s)
* 16:04 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:01 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 15:57 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328{{!}} Bumping portals to master (T128546)]] (duration: 01m 52s)
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328{{!}} Bumping portals to master (T128546)]] (duration: 01m 54s)
* 15:46 jbond: upgrade cas/idp to 6.4.2
* 14:56 mutante: mw2253 - shut down and downtimed for 2 days
* 14:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
* 14:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
* 14:49 mutante: depooling mw2253 for DRAC upgrade ([[phab:T283582|T283582]])
* 14:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 14:45 jbond: update cas package
* 14:31 marostegui: Deploy schema change on s3 codfw - [[phab:T291719|T291719]]
* 12:04 ema: cp3062: upgrade varnish to 6.0.8-1wm2 [[phab:T293879|T293879]]
* 11:57 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm2 [[phab:T293879|T293879]]
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 Lucas_WMDE: UTC morning backport+config window done
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732969{{!}}Remove dispatchLagToMaxLagFactor Wikibase setting (T292604)]] (duration: 00m 54s)
* 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732951{{!}}Remove wikibaseDispatchRedisLockManager config (T292604)]] (duration: 00m 54s)
* 11:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732950{{!}}Remove wmg variables for dispatchChanges.php Wikibase settings (T292604)]] (duration: 00m 55s)
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732949{{!}}Remove dispatchChanges.php-related Wikibase settings (T292604)]] (duration: 00m 55s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732372{{!}}Remove dispatchViaJobs-related Wikibase settings (T291828)]] (duration: 00m 56s)
* 09:52 godog: bounce uwsgi graphite web on graphite2003 - [[phab:T294220|T294220]]
* 09:52 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:48 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:733089{{!}}[BETA CLUSTER] Enable WikibaseLexeme Scribunto access (T294159)]] (merged on Friday, syncing now to avoid outdated files even if it’s just -labs.php) (duration: 00m 55s)
* 09:18 godog: bounce graphite-web on graphite2003 to test timeout bump - [[phab:T294220|T294220]]
* 08:08 XioNoX: merge DNS changes to add drmrs
* 07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 05:47 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=wtp1026.*
* 05:43 _joe_: pooling wtp1042 [[phab:T294212|T294212]]
* 05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1109.eqiad.wmnet with OS buster
* 05:01 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1109.eqiad.wmnet with OS buster
* 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 (s8) for reimage [[phab:T290868|T290868]]', diff saved to https://phabricator.wikimedia.org/P17590 and previous config saved to /var/cache/conftool/dbconfig/20211025-043028-marostegui.json


== 2021-03-01 ==
== 2021-10-23 ==
* 23:05 eileen: civicrm revision changed from {{Gerrit|04a029958c}} to {{Gerrit|e1dacbe348}}, config revision is {{Gerrit|643477b35d}}
* 16:40 dcausse: restarting blazegraph on wdqs1004 and wdqs1006 (free allocators alert)
* 23:01 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@61e7533]: ores_bulk_ingest: Handle unexpected api response (duration: 01m 33s)
* 15:45 urbanecm: Start server-side upload for 1 video file ([[phab:T289781|T289781]]), testing whether [[phab:T291137|T291137]] is still an issue
* 23:00 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@61e7533]: ores_bulk_ingest: Handle unexpected api response
* 22:57 mholloway-shell@deploy1002: Synchronized php-1.36.0-wmf.32/extensions/WikimediaEvents: Fix: Restore exporting wgWMESchemaEditAttemptStepSamplingRate to JS (duration: 00m 57s)
* 22:41 mstyles@deploy1002: Finished deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix ([[phab:T270103|T270103]]) (duration: 02m 04s)
* 22:39 mstyles@deploy1002: Started deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix ([[phab:T270103|T270103]])
* 22:22 dwisehaupt: ran the following on frdb2001 to allow replication to continue after conversion to utf8mb4 charset: set global slave_type_conversions = ALL_NON_LOSSY;
* 22:21 dwisehaupt: stopping mysql replication on frdb2001 and starting utf8mb4 table alters under a root screen session
* 22:16 eileen: civicrm revision changed from {{Gerrit|f07390ff87}} to {{Gerrit|04a029958c}}, config revision is {{Gerrit|643477b35d}}
* 22:12 twentyafterfour@deploy1002: Finished scap: (no justification provided) (duration: 16m 24s)
* 21:57 twentyafterfour: running scap sync from the new server deply1002
* 21:56 twentyafterfour@deploy1002: Started scap: (no justification provided)
* 21:54 mstyles@deploy1002: Finished deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix ([[phab:T270103|T270103]]) (duration: 02m 34s)
* 21:52 mstyles@deploy1002: Started deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix ([[phab:T270103|T270103]])
* 21:49 mutante: deploy1002 - removed scap-global-lock, unlocked scap
* 21:43 phamhi: rebooted clouddb1013 for maintenance
* 21:38 mutante: cumin 'mw*' 'grep master_rsync /etc/scap.cfg' showed all mw servers are now using deploy1002 ([[phab:T265963|T265963]])
* 21:30 shdubsh: completed removal of kafka logging inputs to legacy logstash cluster - [[phab:T234854|T234854]]
* 21:18 mutante: mw1262 - running puppet to switch to new deployment server, scap pull
* 21:16 effie: pooling mw1262 back
* 21:08 mutante: [mwdebug1001:~] $ /usr/local/lib/nagios/plugins/check_mw_versions --deployhost deploy1002.eqiad.wmnet - OKAY: wikiversions in sync ([[phab:T265963|T265963]])
* 21:05 mutante: re-enabling puppet on deploy1001 - running puppet on deploy*, switching eqiad scap master and deployment_server globally ([[phab:T265963|T265963]])
* 20:37 mutante: deploy1001 - disable puppet and manually create scap-global-lock - NO DEPLOYMENTS
* 20:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1029.eqiad.wmnet
* 20:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1029.eqiad.wmnet
* 20:28 effie: upgrade mc1029, mc2029 to memcached 1.6
* 19:55 urbanecm@deploy1001: Synchronized wmf-config/config/hrwiki.yaml: {{Gerrit|d53834e}}: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (3/3; [[phab:T275684|T275684]]) (duration: 00m 54s)
* 19:54 urbanecm@deploy1001: sync-file aborted: {{Gerrit|d53834e}}: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (3/3; [[phab:T275684|T275684]]) (duration: 00m 03s)
* 19:53 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: {{Gerrit|d53834e}}: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (2/3; [[phab:T275684|T275684]]) (duration: 00m 56s)
* 19:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d53834e}}: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (1/3; [[phab:T275684|T275684]]) (duration: 00m 55s)
* 19:41 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:666842{{!}}EventLoggingSchemas: Bump HomepageVisit version (T275615)]] (duration: 00m 56s)
* 19:34 phuedx@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:667680{{!}}Revert "Revert "vector: Stage 2 of WVUI search treatment A/B test"" (T249297)]] (duration: 00m 54s)
* 19:20 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599b7390c840388d97dc4cdbf1796451d4024c22}}: Simplify deployment of Growth team features (3/3; [[phab:T276091|T276091]]) (duration: 01m 00s)
* 19:01 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|de0f74126eddafb5375b853d543b377e78544caa}}: Simplify deployment of Growth team features (2/3; [[phab:T276091|T276091]]) (duration: 00m 57s)
* 18:56 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e991806eb9dc5ec018ebc59832d02e8a6563ba0a}}: Simplify deployment of Growth team features (1/3; [[phab:T276091|T276091]]) (duration: 00m 57s)
* 18:42 mutante: mwmaint2002.mgmt - racadm serveraction powerup
* 18:26 ryankemper: [Relforge] Lifting downtime on `relforge1004` now that [[phab:T275658|T275658]] is done
* 18:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
* 18:24 mutante: mw1307 - back to stretch now
* 18:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
* 18:20 mutante: mwmaint2002 - shutting down for maintenance
* 18:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1098.eqiad.wmnet with reason: REIMAGE
* 18:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1098.eqiad.wmnet with reason: REIMAGE
* 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mwmaint2002.codfw.wmnet with reason: new install
* 18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mwmaint2002.codfw.wmnet with reason: new install
* 18:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 18:00 mutante: puppetmaster1001 - generating mcrouter cert for mwmaint2002 [[phab:T275905|T275905]]
* 17:58 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 17:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 17:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 17:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
* 17:07 mutante: our latest Wikipedia language edition ready to move on from the incubator https://tay.wikipedia.org
* 17:05 mutante: new Wikimedia project language - tay - Atayal is spoken by the Atayal people of Taiwan
* 17:03 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 16:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1097.eqiad.wmnet with reason: REIMAGE
* 16:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1097.eqiad.wmnet with reason: REIMAGE
* 16:20 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 15:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 15:11 vgutierrez: rolling restart of ats-tls on cp[5007-5011]
* 14:49 marostegui: Failover m3 proxy back to dbproxy1020
* 14:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1030.eqiad.wmnet
* 14:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1030.eqiad.wmnet
* 14:18 effie: upgrade mc1030 mc2030 to memcached 1.6
* 14:07 marostegui: Upgrade dbproxy1020 kernel
* 14:05 moritzm: installing openldap security updates on stretch (client-side tools/libs only, slapd instances all on Buster and fixed)
* 13:22 moritzm: instaling docker.io security updates for Buster
* 12:26 awight: EU config deployments complete
* 12:10 awight@deploy1001: Synchronized wmf-config: Config: [[gerrit:666441{{!}}GrowthExperiments: set GELinkRecommendationsUseEventGate (T274198)]] (duration: 01m 05s)
* 11:49 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:667574{{!}} Bumping portals to master (T128546)]] (duration: 00m 55s)
* 11:48 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:667574{{!}} Bumping portals to master (T128546)]] (duration: 00m 55s)
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14547 and previous config saved to /var/cache/conftool/dbconfig/20210301-104842-root.json
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 85%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14546 and previous config saved to /var/cache/conftool/dbconfig/20210301-103338-root.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14545 and previous config saved to /var/cache/conftool/dbconfig/20210301-101835-root.json
* 10:15 vgutierrez: restart ats-tls on cp5012
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 65%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14544 and previous config saved to /var/cache/conftool/dbconfig/20210301-100331-root.json
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14543 and previous config saved to /var/cache/conftool/dbconfig/20210301-094828-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 40%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14542 and previous config saved to /var/cache/conftool/dbconfig/20210301-093324-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14541 and previous config saved to /var/cache/conftool/dbconfig/20210301-092536-root.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 30%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14540 and previous config saved to /var/cache/conftool/dbconfig/20210301-091820-root.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 85%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14539 and previous config saved to /var/cache/conftool/dbconfig/20210301-091032-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14538 and previous config saved to /var/cache/conftool/dbconfig/20210301-090317-root.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14537 and previous config saved to /var/cache/conftool/dbconfig/20210301-085529-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 20%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14536 and previous config saved to /var/cache/conftool/dbconfig/20210301-084813-root.json
* 08:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|92f65972f4277624f74369af08563a8ca6254bda}}: rowiki: Update help panel links ([[phab:T275130|T275130]]) (duration: 01m 08s)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 65%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14535 and previous config saved to /var/cache/conftool/dbconfig/20210301-084025-root.json
* 08:38 elukey: reboot an-worker1112
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 15%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14534 and previous config saved to /var/cache/conftool/dbconfig/20210301-083310-root.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14533 and previous config saved to /var/cache/conftool/dbconfig/20210301-082521-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14532 and previous config saved to /var/cache/conftool/dbconfig/20210301-081806-root.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 40%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14531 and previous config saved to /var/cache/conftool/dbconfig/20210301-081018-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14530 and previous config saved to /var/cache/conftool/dbconfig/20210301-080303-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14529 and previous config saved to /var/cache/conftool/dbconfig/20210301-075514-root.json
* 07:53 marostegui: Upgrade pc1010 pc2008 pc200 to 10.4.18
* 07:53 elukey: clean up old logs + apt-get clean + puppet clientbucket on an-coord1001 to free space
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 4%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14528 and previous config saved to /var/cache/conftool/dbconfig/20210301-074759-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 15%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14527 and previous config saved to /var/cache/conftool/dbconfig/20210301-074011-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Give some more weight to db1168', diff saved to https://phabricator.wikimedia.org/P14526 and previous config saved to /var/cache/conftool/dbconfig/20210301-072957-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14525 and previous config saved to /var/cache/conftool/dbconfig/20210301-072507-root.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Give some more weight to db1168', diff saved to https://phabricator.wikimedia.org/P14524 and previous config saved to /var/cache/conftool/dbconfig/20210301-071047-marostegui.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14523 and previous config saved to /var/cache/conftool/dbconfig/20210301-071004-root.json
* 07:05 marostegui: Stop MySQL on db2082 to clone db2152 - [[phab:T275633|T275633]]
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14521 and previous config saved to /var/cache/conftool/dbconfig/20210301-065500-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1168 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14520 and previous config saved to /var/cache/conftool/dbconfig/20210301-064704-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1168 to dbctl [[phab:T258361|T258361]]!', diff saved to https://phabricator.wikimedia.org/P14519 and previous config saved to /var/cache/conftool/dbconfig/20210301-064603-marostegui.json
* 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1092.eqiad.wmnet
* 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1092.eqiad.wmnet


== 2021-02-28 ==
== 2021-10-22 ==
* 14:17 gehel: repooled wdqs1011 - catched up on lag
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:57 bblack: re-pooling eqiad in DNS
* 20:54 legoktm: <XioNoX> I disabled the interface on cr1, going to re-enabled the active on on cr2
* 20:48 legoktm: bblack has temporarily depooled eqiad https://gerrit.wikimedia.org/r/733043
* 20:41 XioNoX: disable sessions to equinix eqiad IXP
* 19:17 urbanecm: Start server-side upload of 1 video file ([[phab:T294134|T294134]])
* 15:06 jbond: upload puppetboard_3.1.0-1_all.deb to ullseye-wikimedia
* 13:42 ema: deployment-cache-upload06: restart varnish-frontend, package got upgraded to 6.0.8 [[phab:T294116|T294116]]
* 13:30 jbond: upload python3-pypuppetdb_2.4.0-1_all.deb to bullseye
* 10:46 jbond: upload cas_6.4.2-1+wmf10u1
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 09:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294029|T294029]]
* 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2025.codfw.wmnet with OS buster
* 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 08:27 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:24 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:23 ema: cp3062: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 08:00 ema: deployment-cache-text06: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17580 and previous config saved to /var/cache/conftool/dbconfig/20211022-055403-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17579 and previous config saved to /var/cache/conftool/dbconfig/20211022-053900-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17578 and previous config saved to /var/cache/conftool/dbconfig/20211022-052356-root.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17577 and previous config saved to /var/cache/conftool/dbconfig/20211022-050852-root.json
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17576 and previous config saved to /var/cache/conftool/dbconfig/20211022-045349-root.json
* 04:46 marostegui_: Deploy schema change on s8 codfw - [[phab:T291719|T291719]]
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17575 and previous config saved to /var/cache/conftool/dbconfig/20211022-043845-root.json
* 02:59 ejegg: updated payments-wiki from {{Gerrit|088a8cda1e}} to {{Gerrit|6e810fb401}}


== 2021-02-27 ==
== 2021-10-21 ==
* 21:19 dwisehaupt: ran the following on frdb2002 to allow replication to continue after conversion to utf8mb4 charset: set global slave_type_conversions = ALL_NON_LOSSY;
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:44 gehel: depooled wdqs1011 to catch up on lag
* 23:38 jforrester@deploy1002: Synchronized w/fatal-error.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 18:37 gehel: powercycling wdqs1011
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:08 mutante: deploy1002 - rsyncing home dirs from deploy1001
* 23:37 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 23:36 jforrester@deploy1002: Synchronized multiversion/: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 23:34 jforrester@deploy1002: Synchronized docroot/noc/conf/index.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 23:33 jforrester@deploy1002: Synchronized wmf-config: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 23:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:730946{{!}}CommonSettings: Drop legacy CentralAuth config flag, never read (T277932)]] (duration: 00m 55s)
* 23:18 thcipriani@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 23:15 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:42 mutante: [[phab:T294038|T294038]] [krb1001:~] $ sudo manage_principals.py create effeietsanders ... Principal successfully created.  . .Successfully sent email
* 21:44 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 47s)
* 21:41 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 20:54 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 13s)
* 20:53 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 20:53 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 35s)
* 20:52 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:42 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Update $wgTimelineFonts for new path to unifont in Shellbox container ([[phab:T293050|T293050]]) (duration: 00m 55s)
* 19:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:31 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 19:23 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:10 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs (duration: 00m 23s)
* 19:09 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs
* 19:07 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: (no justification provided) (duration: 00m 08s)
* 19:07 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: (no justification provided)
* 18:53 urbanecm: Deploy security patch for [[phab:T285116|T285116]] (wmf.4, wmf.5)
* 18:53 mutante: dumpsdata1003 - sudo systemctl reset-failed to clear Icinga alert about failed cleanup_tmpdumps.service
* 17:55 mutante: that's a key for https://www.worldcat.org/whatis/default.jsp btw for those wondering
* 17:53 mutante: citoid - replaced "wskey" for worldcat in private repo as requested on [[phab:T294010|T294010]] (is in 4 places, 3 for deployment_server/k8s and one remnant for scb)
* 17:53 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:50 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 16:12 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:07 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 16:06 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (2/3) (duration: 00m 54s)
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:04 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 16:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (2/3) (duration: 00m 55s)
* 15:58 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (1/3) (duration: 00m 57s)
* 15:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 54s)
* 15:12 Lucas_WMDE: my next message accidentally says 1/3 again but it’s 2/3, sorry
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 14:42 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/Wikibase.default.php: Backport: [[gerrit:732666{{!}}Enable dispatching via jobs by default (T291828)]] (duration: 00m 55s)
* 14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client/: Backport: [[gerrit:732674{{!}}Fix ExternalUserNames service wiring for local database]] (duration: 00m 57s)
* 14:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:56 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:55 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:34 volans: uploaded spicerack_1.0.6 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 12:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 12:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 12:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 12:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 12:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 11:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 11:13 Lucas_WMDE: UTC morning backport+config window done
* 11:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294008|T294008]]
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730848{{!}}Configure event stream for map tiles state change (T289771)]] (duration: 01m 04s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:47 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:14 jbond: mergeing refactor of P:base Gerrit:714975
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:56 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 03s)
* 08:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:26 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:25 ema: cp3062: revert vsl_space experiment [[phab:T293879|T293879]]
* 08:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1004.eqiad.wmnet with OS bullseye
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17563 and previous config saved to /var/cache/conftool/dbconfig/20211021-080330-root.json
* 07:56 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1004.eqiad.wmnet with OS bullseye
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17562 and previous config saved to /var/cache/conftool/dbconfig/20211021-074826-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17561 and previous config saved to /var/cache/conftool/dbconfig/20211021-073323-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17560 and previous config saved to /var/cache/conftool/dbconfig/20211021-071819-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17559 and previous config saved to /var/cache/conftool/dbconfig/20211021-070315-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17558 and previous config saved to /var/cache/conftool/dbconfig/20211021-064812-root.json
* 06:35 elukey: `systemctl reload nginx` on cloudelastic100[5,6] to pick up the new TLS certificate and clear alerts - [[phab:T293826|T293826]]
* 04:47 marostegui: Deploy schema change on s5 codfw - [[phab:T291719|T291719]]
* 04:37 marostegui: Deploy schema change on s6 codfw - [[phab:T291719|T291719]]
* 04:04 legoktm: restarted apache on lists1001 so it only uses new TLS cert ([[phab:T293826|T293826]])
* 03:29 eileen: civicrm revision changed from {{Gerrit|e889831012}} to {{Gerrit|733a8fceda}}, config revision is {{Gerrit|eed79486d5}}
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-02-26 ==
== 2021-10-20 ==
* 20:29 mutante: deploy2001 - /srv/mediawiki-staging  sudo find . -name *.cdb delete - deleted 190 GB of old cdb files ([[phab:T275826|T275826]] [[phab:T265963|T265963]])
* 23:56 thcipriani@deploy1002: Finished scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]] (duration: 11m 41s)
* 18:31 dwisehaupt: starting the utf8mb4 table alters on frdb2002 under a root screen session
* 23:44 thcipriani@deploy1002: Started scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]]
* 17:59 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:57 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fawiki require login for creation of pages in the draft namespace [[phab:T291018|T291018]] (duration: 01m 02s)
* 14:57 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 23:27 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fawiki require login to edit main namespace [[phab:T291018|T291018]] (duration: 01m 04s)
* 14:49 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 22:13 dancy@deploy1002: Synchronized README: testing (4/4) (duration: 02m 52s)
* 14:44 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 22:00 dancy@deploy1002: Synchronized README: testing (3/4) (duration: 02m 57s)
* 14:43 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 21:54 dancy@deploy1002: Synchronized README: testing (2) (duration: 01m 02s)
* 14:38 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 21:52 dancy@deploy1002: Synchronized README: (no justification provided) (duration: 01m 03s)
* 14:37 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 21:50 dancy: Testing a series of one-file scap sync-file runs
* 14:31 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:25 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:22 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 21:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9cf996a38d82fdd67e600a5a951e88423957e8d}}: Promote Growth features out of darkmode on several wikis  ([[phab:T291826|T291826]], [[phab:T255037|T255037]], [[phab:T287878|T287878]]) (duration: 01m 04s)
* 14:17 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:56 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 20:38 eileen: civicrm revision changed from {{Gerrit|9b5e0d015b}} to {{Gerrit|e889831012}}, config revision is {{Gerrit|eed79486d5}}
* 13:51 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 20:25 legoktm: uploaded php7.4 on buster to apt.wm.o ([[phab:T293449|T293449]])
* 13:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 19:24 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations (duration: 00m 46s)
* 13:45 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 19:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations
* 13:44 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 19:09 mutante: disabling puppet on mw* for a minute to deploy a change
* 13:38 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1031.eqiad.wmnet
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1031.eqiad.wmnet
* 18:31 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 12:59 effie: upgrade memcached on mc1031, mc2031
* 18:30 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:24 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:28 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org -  [[phab:T293810|T293810]]
* 12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:27 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org
* 12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:01 razzi@deploy1002: Finished deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f] (duration: 23m 42s)
* 12:22 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 17:00 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client: Update deprecated calls to ParserOutput in ShortDescHandler - [[phab:T293860|T293860]] (duration: 01m 03s)
* 12:22 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 16:53 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/LiquidThreads/pages/LqtDiscussionPager.php: Remove deprecated usage of setProperty - [[phab:T293895|T293895]] (duration: 01m 03s)
* 12:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 16:49 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GeoCrumbs: Replace use of deprecated ParserOutput:getProperty() - [[phab:T293894|T293894]] (duration: 01m 09s)
* 12:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Add new vslow,dump host to codfw s4 - [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14508 and previous config saved to /var/cache/conftool/dbconfig/20210226-121438-marostegui.json
* 16:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 16:37 razzi@deploy1002: Started deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f]
* 12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1003.wikimedia.org
* 16:36 razzi: deploy refinery change for https://phabricator.wikimedia.org/T287084
* 12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:13 jbond: upload cas_6.4.2-1_amd64.deb
* 12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:39 volans@cumin2002: START - Cookbook sre.dns.netbox
* 12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:57 moritzm: installing modsecurity-crs security updates on Buster
* 12:07 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 14:48 moritzm: installing xmlgraphics-commons security updates on Buster
* 12:00 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
* 14:46 moritzm: installing irssi security updates on Buster
* 12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 11:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 11:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:35 moritzm: installing commons-io security updates on Buster
* 11:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:27 ema: cp3062: test higher vsl_space values [[phab:T293879|T293879]]
* 11:58 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:27 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 11:55 jbond42: delete exim messages in the queue ro root@wikimedia.org older then 7200 seconds and younger the 10800 seconds on mx1001
* 14:12 moritzm: installing ruby2.3 security updates
* 11:54 jbond42: delete exim messages in the queue ro root@wikimedia.org older then 7200 seconds and younger the 10800 seconds on mx2001
* 13:40 moritzm: installing apache2 security updates on buster
* 11:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:47 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 01m 02s)
* 11:42 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 11:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
* 13:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 11:41 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 11:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=ats-tls
* 11:38 vgutierrez: rolling restart of ats-tls on cp500[1-5]
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=varnish-fe
* 11:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:51 ema: cp3062: bump vsl_space from 80M (default) to 512M [[phab:T293879|T293879]] - varnish restart needed
* 11:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 11:33 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 12:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 11:32 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:30 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:27 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 12:02 urbanecm@deploy1002: Finished scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]]) (duration: 25m 19s)
* 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
* 11:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2007.codfw.wmnet
* 11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2007.codfw.wmnet
* 11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 11:37 urbanecm@deploy1002: Started scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]])
* 11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2005.codfw.wmnet
* 11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 11:21 moritzm: installing ffmpeg security updates
* 11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e520fc57411bb19123766192cd636396ea6fc59d}}: GrowthExperiments: Add campaign pattern for enwiki ([[phab:T293699|T293699]]) (duration: 01m 22s)
* 11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2005.codfw.wmnet
* 11:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
* 10:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 11:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
* 10:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 11:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 09:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 09:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 08:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 08:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:17 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1118.eqiad.wmnet with OS buster
* 11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 07:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1118.eqiad.wmnet with OS buster
* 11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 (s1) for reimage [[phab:T290865|T290865]]', diff saved to https://phabricator.wikimedia.org/P17552 and previous config saved to /var/cache/conftool/dbconfig/20211020-064529-marostegui.json
* 11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS buster
* 11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 (s1) after upgrade', diff saved to https://phabricator.wikimedia.org/P17551 and previous config saved to /var/cache/conftool/dbconfig/20211020-063926-marostegui.json
* 11:16 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 06:35 marostegui: Upgrade db1106
* 11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17550 and previous config saved to /var/cache/conftool/dbconfig/20211020-063431-marostegui.json
* 11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 06:31 dcausse: restarting blazegraph on wdqs1012
* 11:15 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 06:28 elukey: reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage
* 11:15 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 06:21 marostegui: Depool clouddb1013 for upgrade
* 11:12 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS buster
* 11:10 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 06:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17549 and previous config saved to /var/cache/conftool/dbconfig/20211020-061202-marostegui.json
* 11:05 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 06:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 06:05 XioNoX: put transport link between ulsfo and eqsin in service - [[phab:T273308|T273308]]
* 11:05 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS buster
* 11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 04:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:40 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable $wgLocalHTTPProxy on group0 wikis ([[phab:T288848|T288848]]) (duration: 01m 05s)
* 11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 01:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 01:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 00:00 tgr: west coast evening deploys done
* 11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 11:00 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:59 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2002-dev.codfw.wmnet
* 10:55 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2002-dev.codfw.wmnet
* 10:54 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2003-dev.codfw.wmnet
* 10:50 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2003-dev.codfw.wmnet
* 10:50 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2003-dev.codfw.wmnet
* 10:46 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 10:44 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1039.eqiad.wmnet
* 10:44 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:44 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:43 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2003-dev.codfw.wmnet
* 10:41 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2002-dev.codfw.wmnet
* 10:38 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2002-dev.codfw.wmnet
* 10:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2001-dev.codfw.wmnet
* 10:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1039.eqiad.wmnet
* 10:34 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2001-dev.codfw.wmnet
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14505 and previous config saved to /var/cache/conftool/dbconfig/20210226-102254-root.json
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
* 10:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
* 10:14 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
* 10:09 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2002-dev.codfw.wmnet
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 85%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14504 and previous config saved to /var/cache/conftool/dbconfig/20210226-100750-root.json
* 10:06 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2001-dev.wikimedia.org
* 10:05 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2002-dev.codfw.wmnet
* 09:59 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudweb2001-dev.wikimedia.org
* 09:59 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2003-dev.wikimedia.org
* 09:55 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2002-dev.wikimedia.org
* 09:54 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
* 09:52 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudservices2003-dev.wikimedia.org
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14503 and previous config saved to /var/cache/conftool/dbconfig/20210226-095247-root.json
* 09:50 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudservices2002-dev.wikimedia.org
* 09:50 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
* 09:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
* 09:43 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
* 09:41 aborrero@cumin2001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcontrol2001-dev.wikimedia.org
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 65%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14502 and previous config saved to /var/cache/conftool/dbconfig/20210226-093743-root.json
* 09:33 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
* 09:28 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:24 root@cumin1001: START - Cookbook sre.dns.netbox
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14501 and previous config saved to /var/cache/conftool/dbconfig/20210226-092240-root.json
* 09:13 jbond42: pupet enabled post sudoers fix, running puppet fleet wide with  cumin -b 15 '*' 'run-puppet-agent '
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14500 and previous config saved to /var/cache/conftool/dbconfig/20210226-090736-root.json
* 08:55 jbond42: disabled puppet pending rollback of https://gerrit.wikimedia.org/r/666899
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14498 and previous config saved to /var/cache/conftool/dbconfig/20210226-085233-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 15%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14497 and previous config saved to /var/cache/conftool/dbconfig/20210226-083729-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14496 and previous config saved to /var/cache/conftool/dbconfig/20210226-082226-root.json
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1058.eqiad.wmnet with reason: REIMAGE
* 08:17 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1058.eqiad.wmnet with reason: REIMAGE
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14495 and previous config saved to /var/cache/conftool/dbconfig/20210226-080722-root.json
* 08:04 elukey: run ipmi mc reset cold for analytics1058 - mgmt responding to pings and ipmi, but not to ssh
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14494 and previous config saved to /var/cache/conftool/dbconfig/20210226-075219-root.json
* 07:02 marostegui: Stop MySQL on db2106 to clone db2147 [[phab:T275633|T275633]]
* 07:01 elukey: reboot an-worker1099 to clear out kernel soft lockup errors
* 06:59 elukey: restart datanode on an-worker1099 - soft lockup kernel errors
* 06:53 kartik@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/ContentTranslation: Bump ContentTranslation to {{Gerrit|e6b1a7c}} to include lost {{gerrit{{!}}666327}} backport (duration: 00m 58s)
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1092 from dbctl [[phab:T275019|T275019]]', diff saved to https://phabricator.wikimedia.org/P14492 and previous config saved to /var/cache/conftool/dbconfig/20210226-063914-marostegui.json
* 06:32 kartik@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/ContentTranslation: Resync ContentTranslation for {{gerrit{{!}}666327}} (duration: 01m 16s)
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 to clone db1134 [[phab:T275343|T275343]]', diff saved to https://phabricator.wikimedia.org/P14490 and previous config saved to /var/cache/conftool/dbconfig/20210226-061705-marostegui.json
* 05:29 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2045.codfw.wmnet with reason: REIMAGE
* 05:27 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2045.codfw.wmnet with reason: REIMAGE
* 05:25 ryankemper: [relforge] Downtimed `relforge1004` until `2021-03-02 07:23:36` (https://phabricator.wikimedia.org/T275658 is in flight to fix broken `kibana.service`)
* 05:07 ryankemper: [[phab:T275345|T275345]] `sudo -i wmf-auto-reimage-host --conftool -p [[phab:T275345|T275345]] elastic2045.codfw.wmnet` on `ryankemper@cumin2001` tmux session `elastic_reimage_elastic1065`
* 04:23 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id [[phab:T267927|T267927]] --reload-data wikidata --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool` on `ryankemper@cumin2001` tmux session `wdqs_data_reload_2008`
* 04:21 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 00:14 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/Graph/: {{Gerrit|9d5cf348f5dda32f8889d5160bb1fe34a4e07f8c}}: Do not log graph errors to WMF servers ([[phab:T274557|T274557]]) (duration: 01m 36s)


== 2021-02-25 ==
== 2021-10-19 ==
* 23:55 mutante: deploy1002, deploy2002 - scap-master-sync deploy1001.eqiad.wmnet ([[phab:T265963|T265963]])
* 23:59 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732103{{!}}Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces (T293846)]] (duration: 01m 02s)
* 23:41 mutante: deploy2001 2/2 - because rsync is --delete but also --exclude="**/cache/l10n/*.cdb" --exclude="*.swp"  you can't expect /srv/mediawiki-staging to be the same size on 2 servers
* 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:39 mutante: deploy2001 - scap-master-sync from deploy1001 runs and attempts to --delete files to stay in sync but fails to do so because *.cdb files are in cache dirs and rsync does not want to delete non-empty directories, this leads to build up of the size of /srv/mediawiki-staging to 10 times the size of eqiad
* 23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:34 mutante: deploy2001 - scap-master-sync from deploy1001
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732053{{!}}ruwikiversity: Add 'portal' and 'faculty' namespaces (T293545)]] (duration: 01m 03s)
* 23:13 mutante: deploy1002 - /usr/local/bin/scap-master-sync deploy1001.eqiad.wmnet
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.30 (duration: 04m 20s)
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:38 legoktm: pushed new version of docker-registry.discovery.wmnet/wikimedia-buster image
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710565{{!}}Set the project namespace and sitename for Javanese Wikipedia and Wiktionary (T287437)]] (duration: 01m 02s)
* 21:20 mutante: deploy2001 - rsynced /srv/deployment from deploy1001 after gerrit:666757
* 23:23 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731953{{!}}Create Portal and Portal talk namespace for shiwiki (T288909)]] (duration: 01m 03s)
* 20:57 eileen: civicrm revision changed from {{Gerrit|604d07c859}} to {{Gerrit|f07390ff87}}, config revision is {{Gerrit|643477b35d}}
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:35 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.32  refs [[phab:T274936|T274936]]
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:17 tgr@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/GrowthExperiments/: Backport: [[gerrit:666704{{!}}Impact module: Add "not rendered" state (T270294, T275615)]] (duration: 01m 08s)
* 23:13 tgr@deploy1002: Synchronized static: Config: [[gerrit:731231{{!}}Repair the size of the logo of Kashmiri Wikipedia (T293342)]] (duration: 02m 14s)
* 19:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/GrowthExperiments/: Backport: [[gerrit:666704{{!}}Impact module: Add "not rendered" state (T270294, T275615)]] (duration: 01m 26s)
* 21:34 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete {{!}} fixed Icinga alert:  RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: [[phab:T165885|T165885]]
* 19:16 ryankemper: [[phab:T267927|T267927]] Downloading dumps: `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_latest_dumps`
* 21:32 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete
* 18:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:56 ejegg: updated payments-wiki from {{Gerrit|0f48acea49}} to {{Gerrit|30e596903d}}
* 18:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 18:59 ryankemper: [[phab:T267927|T267927]] Manual puppet run got `wdqs2008` present in puppetdb again. Now being blocked by lack of host key for `wdqs2008` present on `cumin2001`, so I'm running puppet on `cumin2001` to get the latest state of `/etc/ssh/ssh_known_hosts`
* 18:46 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/MediaSearch/: {{Gerrit|a84a675}}: {{Gerrit|3231578}}: MediaSearch backports ([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 18:57 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/: {{Gerrit|694580a}}: {{Gerrit|c02e301}}: MediaSearch backports([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 18:57 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 18:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 18:56 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:30 foks: deleting 1 more email with deleteUserEmail.php
* 18:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1476a2d93}}: {{Gerrit|dd8393c1a0}}: foundationwiki: Restrict sensitive namespaces to editor group ([[phab:T205350|T205350]]) (duration: 01m 03s)
* 18:50 ryankemper: [[phab:T267927|T267927]] Trying to kick off data reload on `wdqs2008` from `cumin2001` fails because of `spicerack.remote.RemoteError: No hosts provided`. Doing some spelunking through IRC history looks like this happens when a host is not present in puppetDB. I'm confirmed `wdqs2008` is absent on puppetboard, so running puppet agent to get it re-registered (hopefully)
* 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 18:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a2893c7190e615a247674dbf7f87348bf43b91c}}: Enable topic subscriptions as a beta feature on all remaining projects ([[phab:T287802|T287802]]) (duration: 01m 04s)
* 18:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 18:00 legoktm@deploy1002: Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (2/2) (duration: 01m 06s)
* 18:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (1/2) (duration: 01m 05s)
* 18:37 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:57 foks: removing six email addresses on request (with deleteUserEmail.php)
* 18:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 18:36 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:25 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 18:36 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 18:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 16:48 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 16:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 16:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 16:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 16:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 16:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 18:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 18:25 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 18:23 bblack: dns[1235]002 - upgrade gdnsd to 3.6.0 (dns4002 and authdns2001 already running it for some time!)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 18:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 15:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 18:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 15:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:40 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - remove now redundant stream setting - [[phab:T277193|T277193]] (duration: 01m 04s)
* 18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
* 15:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 15:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 15:30 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 15:28 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 15:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 14:34 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:08 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:29 jbond: disable puppet on lvs, cp, authdns, mc, mw-be and wcqs to while i merge G:662699
* 18:08 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:58 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:11 hashar@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 45m 13s)
* 17:58 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:52 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 17:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:26 hashar@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17547 and previous config saved to /var/cache/conftool/dbconfig/20211019-131927-root.json
* 17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17546 and previous config saved to /var/cache/conftool/dbconfig/20211019-131651-root.json
* 17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17545 and previous config saved to /var/cache/conftool/dbconfig/20211019-130424-root.json
* 17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17544 and previous config saved to /var/cache/conftool/dbconfig/20211019-130147-root.json
* 17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17543 and previous config saved to /var/cache/conftool/dbconfig/20211019-124920-root.json
* 17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17542 and previous config saved to /var/cache/conftool/dbconfig/20211019-124644-root.json
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:40 moritzm: installing aftpd security updates
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17541 and previous config saved to /var/cache/conftool/dbconfig/20211019-123416-root.json
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 12:34 marostegui: Upgrade dbstore1003
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17540 and previous config saved to /var/cache/conftool/dbconfig/20211019-123140-root.json
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17539 and previous config saved to /var/cache/conftool/dbconfig/20211019-121913-root.json
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17538 and previous config saved to /var/cache/conftool/dbconfig/20211019-121636-root.json
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 12:12 XioNoX: push anycast tuning to all Lumen and NTT transit links - [[phab:T288843|T288843]]
* 17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1167 (s8) after upgrade', diff saved to https://phabricator.wikimedia.org/P17537 and previous config saved to /var/cache/conftool/dbconfig/20211019-120918-marostegui.json
* 17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17536 and previous config saved to /var/cache/conftool/dbconfig/20211019-120458-marostegui.json
* 17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17535 and previous config saved to /var/cache/conftool/dbconfig/20211019-120409-root.json
* 17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17534 and previous config saved to /var/cache/conftool/dbconfig/20211019-120348-root.json
* 17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 12:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/WikibaseMediaInfo/: {{Gerrit|ec0125770775c1a1a54c3b592d86d287fd9e3ad6}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 55s)
* 17:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17533 and previous config saved to /var/cache/conftool/dbconfig/20211019-120132-root.json
* 17:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:00 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikibaseMediaInfo/: {{Gerrit|79808a90a95dd5dac2b532b87fb7ec1a490ea0f0}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 56s)
* 17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17532 and previous config saved to /var/cache/conftool/dbconfig/20211019-120024-root.json
* 17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 11:56 XioNoX: push anycast tuning to Tele2, Init7, DT transit links - [[phab:T288843|T288843]]
* 17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 11:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17531 and previous config saved to /var/cache/conftool/dbconfig/20211019-114844-root.json
* 17:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:46 marostegui: Upgrade db1105 (s1,s2)
* 17:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) for upgrade', diff saved to https://phabricator.wikimedia.org/P17530 and previous config saved to /var/cache/conftool/dbconfig/20211019-114649-marostegui.json
* 17:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17529 and previous config saved to /var/cache/conftool/dbconfig/20211019-114520-root.json
* 17:06 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17527 and previous config saved to /var/cache/conftool/dbconfig/20211019-113340-root.json
* 17:06 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17526 and previous config saved to /var/cache/conftool/dbconfig/20211019-113017-root.json
* 17:06 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17525 and previous config saved to /var/cache/conftool/dbconfig/20211019-111837-root.json
* 17:05 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17524 and previous config saved to /var/cache/conftool/dbconfig/20211019-111513-root.json
* 17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7c31b04e50101a60db7ae8acae64bc031f5e1007}}: DPL: Explicitly note it is not possible to enable DPL on any more wikis (duration: 00m 55s)
* 17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17523 and previous config saved to /var/cache/conftool/dbconfig/20211019-110333-root.json
* 17:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 17:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17522 and previous config saved to /var/cache/conftool/dbconfig/20211019-110009-root.json
* 17:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:56 marostegui: Upgrade clouddb1021
* 17:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 17:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:51 moritzm: failover master in ganeti-test to ganeti2026
* 17:01 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:50 godog: bounce superset on an-tool1005 to pick up statsd changes - [[phab:T247963|T247963]]
* 17:01 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS stretch
* 17:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17521 and previous config saved to /var/cache/conftool/dbconfig/20211019-104829-root.json
* 17:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:45 godog: bounce navtiming on webperf1001 to pick up statsd changes - [[phab:T247963|T247963]]
* 16:54 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:45 godog: bounce superset on an-tool1010 to pick up statsd changes - [[phab:T247963|T247963]]
* 16:54 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17520 and previous config saved to /var/cache/conftool/dbconfig/20211019-104506-root.json
* 16:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:38 oblivian@deploy1002: Synchronized w/static.php: Config: [[gerrit:730182{{!}}static.php: Add support for /static/current rewrites (take 2) (T285232)]] (duration: 00m 55s)
* 16:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:37 marostegui: Upgrade db1101 (s7,s8)
* 16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17519 and previous config saved to /var/cache/conftool/dbconfig/20211019-103634-marostegui.json
* 16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:22 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:22 godog: flip mw statsd traffic with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/731918 - [[phab:T247963|T247963]]
* 16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:21 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS stretch
* 16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 16:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 16:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:50 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 16:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 09:44 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.3 (duration: 01m 39s)
* 16:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 09:42 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.2 (duration: 16m 06s)
* 16:28 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:37 godog: move graphite/statsd writes to graphite2003 - [[phab:T247963|T247963]]
* 16:23 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 16:17 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:27 hashar: sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3  # [[phab:T281169|T281169]]
* 16:16 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:27 hashar: Cloned and applied security patches for 1.38.0-wmf.5 # [[phab:T281169|T281169]]
* 16:16 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:19 marostegui: Stop slave on db2112 [[phab:T290865|T290865]]
* 16:16 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 16:16 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 15:38 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl2002.codfw.wmnet
* 09:03 XioNoX: push anycast tuning to all Telia transit links - [[phab:T288843|T288843]]
* 15:26 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
* 08:50 godog: point graphite.discovery.wmnet to graphite2003 - [[phab:T247963|T247963]]
* 15:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
* 08:40 XioNoX: push prep-work for anycast tuning to all sites - [[phab:T288843|T288843]]
* 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl2002.codfw.wmnet
* 08:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl2001.codfw.wmnet
* 08:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 15:05 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl2001.codfw.wmnet
* 08:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php hrwiki --fix
* 15:00 moritzm: installing libmaxminddb updates from buster 10.8 point release
* 08:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 vgutierrez: pool cp4032
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift
* 14:42 vgutierrez: depool cp4032 for ats-tls/NUMA tests
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift-ro
* 14:35 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl1002.eqiad.wmnet
* 08:03 XioNoX: push prep-work for anycast tuning in ulsfo (try 2) - [[phab:T288843|T288843]]
* 14:27 moritzm: installing postgresql security updates on buster
* 08:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:24 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl1001.eqiad.wmnet
* 07:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:22 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1002.eqiad.wmnet
* 07:24 ema: A:cp start rolling varnish upgrades to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 14:20 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-serve-ctrl1002.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17517 and previous config saved to /var/cache/conftool/dbconfig/20211019-072111-root.json
* 14:17 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1002.eqiad.wmnet
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17516 and previous config saved to /var/cache/conftool/dbconfig/20211019-071519-root.json
* 14:16 moritzm: installing cairo security updates on buster
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17515 and previous config saved to /var/cache/conftool/dbconfig/20211019-070607-root.json
* 14:14 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-serve-ctrl1002.eqiad.wmnet
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17514 and previous config saved to /var/cache/conftool/dbconfig/20211019-070016-root.json
* 14:10 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1002.eqiad.wmnet
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17513 and previous config saved to /var/cache/conftool/dbconfig/20211019-065104-root.json
* 14:09 kormat@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1001.eqiad.wmnet
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17512 and previous config saved to /var/cache/conftool/dbconfig/20211019-064512-root.json
* 13:57 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
* 06:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 13:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: REIMAGE
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17511 and previous config saved to /var/cache/conftool/dbconfig/20211019-063559-root.json
* 13:55 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17510 and previous config saved to /var/cache/conftool/dbconfig/20211019-063008-root.json
* 13:53 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: REIMAGE
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17509 and previous config saved to /var/cache/conftool/dbconfig/20211019-062054-root.json
* 13:15 akosiaris: reinitialize all of staging-codfw. kubestage2* and kubestagemaster* have been scheduled downtime in icinga.
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17508 and previous config saved to /var/cache/conftool/dbconfig/20211019-061505-root.json
* 12:32 moritzm: installing openssl security updates on Buster
* 06:06 marostegui: Upgrade dbstore1005
* 12:20 Lucas_WMDE: EU backport&config window done
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17507 and previous config saved to /var/cache/conftool/dbconfig/20211019-060551-root.json
* 12:16 phuedx@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:666425{{!}}[stage 1] Enable WVUI search by default to logged-in modern Vector users except on pilot wikis (T249297)]] (duration: 01m 31s)
* 06:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1134.eqiad.wmnet with reason: REIMAGE
* 06:03 marostegui: Upgrade db1184, db1178
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1134.eqiad.wmnet with reason: REIMAGE
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 for upgrade', diff saved to https://phabricator.wikimedia.org/P17506 and previous config saved to /var/cache/conftool/dbconfig/20211019-060123-marostegui.json
* 11:47 jbond42: upload new wmf-laptop package
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17505 and previous config saved to /var/cache/conftool/dbconfig/20211019-060001-root.json
* 11:40 marostegui: Stop MySQL on db1134 to reimage it to buster [[phab:T275343|T275343]]
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184 for upgrade', diff saved to https://phabricator.wikimedia.org/P17504 and previous config saved to /var/cache/conftool/dbconfig/20211019-055429-marostegui.json
* 11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on dborch1001.wikimedia.org with reason: Restart for new kernel
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on dborch1001.wikimedia.org with reason: Restart for new kernel
* 05:46 marostegui: Reimage db2112 (s1 codfw master) [[phab:T290865|T290865]]
* 11:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host otrs1001.eqiad.wmnet
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 11:22 moritzm: reset-failed ifup@ens5.service on otrs1001 [[phab:T273026|T273026]]
* 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 11:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host otrs1001.eqiad.wmnet
* 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:15 moritzm: rebooting otrs1001 (ticket.wikimedia.org) for a kernel update
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:59 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1117-1118].eqiad.wmnet
* 02:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 10:57 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1117-1118].eqiad.wmnet
* 02:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1117.eqiad.wmnet with reason: REIMAGE
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:40 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
* 00:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 10:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1117.eqiad.wmnet with reason: REIMAGE
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 100%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14481 and previous config saved to /var/cache/conftool/dbconfig/20210225-103719-root.json
* 10:34 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 10:32 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 75%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14480 and previous config saved to /var/cache/conftool/dbconfig/20210225-102215-root.json
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 50%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14479 and previous config saved to /var/cache/conftool/dbconfig/20210225-100712-root.json
* 10:05 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
* 10:03 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
* 10:01 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
* 10:01 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 25%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14477 and previous config saved to /var/cache/conftool/dbconfig/20210225-095208-root.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 10%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14476 and previous config saved to /var/cache/conftool/dbconfig/20210225-093705-root.json
* 09:32 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
* 09:32 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
* 09:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1032.eqiad.wmnet
* 09:14 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1032.eqiad.wmnet
* 09:10 effie: upgrade memcached on mc1032, mc2032, mc2036
* 08:32 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:29 volans@cumin2001: START - Cookbook sre.dns.netbox
* 08:15 vgutierrez: restart ats-tls on cp5006 to enable parent proxies support - [[phab:T274888|T274888]]
* 08:15 XioNoX: un-drain lumen eqiad-codfw link for BW testing
* 08:07 XioNoX: drain lumen eqiad-codfw link for BW testing
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 to clone db1168 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14474 and previous config saved to /var/cache/conftool/dbconfig/20210225-065018-marostegui.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 [[phab:T275019|T275019]]', diff saved to https://phabricator.wikimedia.org/P14473 and previous config saved to /var/cache/conftool/dbconfig/20210225-063243-marostegui.json
* 00:29 ryankemper: [[phab:T274204|T274204]] Restored service health on  `elastic106[0,4,5]` via `sudo apt-get remove --purge wmf-elasticsearch-search-plugins --yes && sudo dpkg -i /var/cache/apt/archives/wmf-elasticsearch-search-plugins_6.5.4-4~stretch_all.deb && sudo puppet agent -tv`. There's some sort of issue with `6.5.4-5~stretch` that we will need to circle back and investigate; for now the fleet is staying on `6.5.4-4~stretch`
* 00:05 ryankemper: [[phab:T274204|T274204]] `Ctrl+C`'d out of the current rolling-upgrade; the 3 hosts that have their elasticsearch systemd units in a failing state are running the latest plugin version, meaning the new version is likely the cause of the failures
* 00:01 mutante: mwlog1001 - temp disabling puppet to deploy gerrit::661200 - because this is a jessie
* 00:01 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)


== 2021-02-24 ==
== 2021-10-18 ==
* 23:42 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 23:40 hoo: Updated the Wikidata property suggester with data from the 2021-10-04 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 23:30 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b654980240d51fff3c6e9c48f7076d4609c2560f}}: Create an alias for the Draft namespace on hrwiki ([[phab:T291755|T291755]]) (duration: 00m 56s)
* 23:18 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster restarts" --task-id [[phab:T274204|T274204]] --nodes-per-run 3`
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ryankemper: [[phab:T274204|T274204]] Beginning rolling-upgrade of `eqiad` CirrusSearch cluster to upgrade to `wmf-elasticsearch-search-plugins/stretch-wikimedia 6.5.4-5~stretch`, see tmux session `elastic_rolling_upgrade` on `ryankemper@cumin1001`
* 23:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=thwiktionary --fix # [[phab:T291761|T291761]]
* 23:13 eileen: civicrm revision is {{Gerrit|5e042e6e57}}, config revision is {{Gerrit|8572611a32}}
* 23:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abe777d28594da852e49ccb1c1597b2598f3e483}}: Create Rhymes namespace for thwiktionary ([[phab:T291761|T291761]]) (duration: 00m 57s)
* 22:09 ryankemper: [[phab:T265113|T265113]] Unbanned `elastic1063` from both Elasticsearch clusters (`production-search-eqiad` and `production-search-omega-eqiad`)
* 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:03 Urbanecm: Deploy security patches for [[phab:T275669|T275669]]
* 23:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:59 andrew@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 22:56 legoktm@deploy1002: Synchronized php-1.38.0-wmf.4/includes/http/MWHttpRequest.php: Allow using a reverse proxy for local HTTP requests ([[phab:T288848|T288848]]) (duration: 00m 56s)
* 20:59 andrew@cumin1001: Added views for new wiki: mniwiki [[phab:T273465|T273465]]
* 22:06 maryum: deployed security patch for [[phab:T293589|T293589]]
* 20:43 mstyles@deploy1001: Finished deploy [wikimedia/discovery/analytics@44fba51]: add import ttl dags - [[phab:T270103|T270103]] (duration: 02m 33s)
* 21:23 maryum: deployed security patch for [[phab:T293556|T293556]]
* 20:40 mstyles@deploy1001: Started deploy [wikimedia/discovery/analytics@44fba51]: add import ttl dags - [[phab:T270103|T270103]]
* 21:05 mutante: mwmaint1002 - sudo -u www-data /usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscript extensions/TranslationNotifications/scripts/DigestEmailer.php --wiki mediawikiwiki {{!}} Fatal error: Uncaught Error: Class 'MediaWiki\MediaWikiServices' not found
* 20:36 andrew@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:58 mutante: mwmaint1002 - attempt to start mediawiki_job_translationnotifications-mediawikiwiki which was alerting as failed
* 20:35 andrew@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:35 andrew@cumin1001: Added views for new wiki: mniwiktionary [[phab:T273459|T273459]]
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.32  refs [[phab:T274936|T274936]] (duration: 01m 10s)
* 19:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:15 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.32  refs [[phab:T274936|T274936]]
* 19:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:12 andrew@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:29 mutante: LDAP: removed non-existent user gerrit2 from group labsadminbots ([[phab:T160122|T160122]])
* 19:52 mstyles@deploy1001: Finished deploy [wikimedia/discovery/analytics@44fba51]: airflow dags for importing ttl data (duration: 00m 42s)
* 19:29 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/resources/store/state.js: {{Gerrit|ac7b4fc2ccc69589e00a42f49d18a8f6d71777f2}}: Revert 727328 ([[phab:T293554|T293554]]) (duration: 00m 56s)
* 19:51 mstyles@deploy1001: Started deploy [wikimedia/discovery/analytics@44fba51]: airflow dags for importing ttl data
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:32 andrew@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:21 andrew@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|f9f968ac7043d2b52cac91dbfaab7e4077b04230}}: Remove unneeded $wgHiddenPrefs[] = visualeditor-betatempdisable ([[phab:T273188|T273188]]) (duration: 01m 04s)
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f21fc4a2938f9e08af54a816b4969f1a9f5b92f1}}: Enable SecurePoll logging for votewiki, testwiki ([[phab:T273990|T273990]]) (duration: 01m 08s)
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Convert $wgEventStreams to be an associative array - [[phab:T277193|T277193]] (duration: 00m 57s)
* 17:40 bblack: authdns2001 - trial upgrade gdnsd to 3.6.0-1~wmf1
* 18:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:47 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 18:07 mutante: gerrit - removed tonina from wmde-mediawiki gerrit group ([[phab:T293621|T293621]])
* 16:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 17:51 mutante: puppet run on all bastion hosts via cumin
* 16:45 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 15:32 mvernon@cumin2002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 16:45 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 15:32 mvernon@cumin2002: START - Cookbook sre.discovery.service-route
* 16:42 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 15:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 16:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 15:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 16:40 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 15:16 herron: reprepro copied anycast-healthchecker, python3-json-logger and python3-anycast-healthchecker from buster-wikimedia to bullseye-wikimedia [[phab:T292196|T292196]]
* 16:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ml-serve[1003-1004].eqiad.wmnet with reason: Reimaging failures due to broken partman recipe
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 16:15 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ml-serve[1003-1004].eqiad.wmnet with reason: Reimaging failures due to broken partman recipe
* 15:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 15:54 milimetric@deploy1001: Finished deploy [analytics/refinery@5908f27] (test): Train hotfix (duration: 00m 13s)
* 14:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 15:54 milimetric@deploy1001: Started deploy [analytics/refinery@5908f27] (test): Train hotfix
* 14:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 15:54 milimetric@deploy1001: Finished deploy [analytics/refinery@5908f27] (thin): Train hotfix (duration: 00m 06s)
* 14:54 herron: rebuilt and uploaded kafkatee for bullseye [[phab:T292196|T292196]]
* 15:54 milimetric@deploy1001: Started deploy [analytics/refinery@5908f27] (thin): Train hotfix
* 14:50 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:54 milimetric@deploy1001: Finished deploy [analytics/refinery@5908f27]: Train hotfix (duration: 11m 36s)
* 14:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:42 milimetric@deploy1001: Started deploy [analytics/refinery@5908f27]: Train hotfix
* 14:36 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731346{{!}}[beta] Rename $wgIPInfoGeoIP2Path to $wgIPInfoGeoIP2Prefix (T289361)]] (duration: 00m 56s)
* 15:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate all WMDE Technical Wishes schemas to EventGate on all wikis (duration: 01m 05s)
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 milimetric@deploy1001: Finished deploy [analytics/refinery@bcb1a69] (test): Regular analytics weekly train TEST [analytics/refinery@bcb1a69] (duration: 00m 13s)
* 14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 milimetric@deploy1001: Started deploy [analytics/refinery@bcb1a69] (test): Regular analytics weekly train TEST [analytics/refinery@bcb1a69]
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 milimetric@deploy1001: Finished deploy [analytics/refinery@bcb1a69] (thin): Regular analytics weekly train THIN [analytics/refinery@bcb1a69] (duration: 00m 06s)
* 14:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 milimetric@deploy1001: Started deploy [analytics/refinery@bcb1a69] (thin): Regular analytics weekly train THIN [analytics/refinery@bcb1a69]
* 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 milimetric@deploy1001: Finished deploy [analytics/refinery@bcb1a69]: Regular analytics weekly train [analytics/refinery@bcb1a69] (duration: 17m 10s)
* 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:06 godog: bounce icinga on alert1001 - reported high latency
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (2/2) (duration: 00m 56s)
* 15:06 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate HomepageVisit and ServerSideAccountCreation EL streams to all wikis - [[phab:T267333|T267333]] (duration: 01m 05s)
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (1/2) (duration: 00m 56s)
* 15:03 milimetric@deploy1001: Started deploy [analytics/refinery@bcb1a69]: Regular analytics weekly train [analytics/refinery@bcb1a69]
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve[1001-1004].eqiad.wmnet with reason: Reimaging for [[phab:T272918|T272918]]
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:731014{{!}}Unconditionally enable Wikibase dispatching via jobs (T291828)]] (duration: 00m 56s)
* 15:01 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve[1001-1004].eqiad.wmnet with reason: Reimaging for [[phab:T272918|T272918]]
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:50 bblack: dns4002 - trial upgrade gdnsd to 3.6.0-1~wmf1
* 12:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2079.codfw.wmnet with OS buster
* 14:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2152.codfw.wmnet with reason: REIMAGE
* 12:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2152.codfw.wmnet with reason: REIMAGE
* 12:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:25 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2016.codfw.wmnet with reason: REIMAGE
* 11:55 Lucas_WMDE: UTC morning backport window done
* 14:25 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2016.codfw.wmnet with reason: REIMAGE
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (2/2) (duration: 00m 56s)
* 14:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2151.codfw.wmnet with reason: REIMAGE
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (1/2) (duration: 00m 56s)
* 14:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: REIMAGE
* 11:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2150.codfw.wmnet with reason: REIMAGE
* 11:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2079.codfw.wmnet with OS buster
* 14:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: REIMAGE
* 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:46 marostegui: Compare data between db1134 and db1163 [[phab:T275343|T275343]]
* 11:49 marostegui: Reimage db2079 (codfw s8 master) [[phab:T290868|T290868]]
* 13:34 moritzm: restarting FPM/mcrouter on mw canaries to pick up openssl updates
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730747{{!}}Set dispatchViaJobsAllowedClients to null everywhere (T291828)]] (duration: 00m 56s)
* 13:11 moritzm: installing openssl security updates on buster
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:32 Urbanecm: Two undeployed patches were reverted to unbreak deployments (666340, 666341), cc marxarelli
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:25 phuedx@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/WikimediaEvents: Backport: [[gerrit:666339{{!}}Fix dynamically loaded instruments]] (duration: 01m 11s)
* 11:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731239{{!}}Make deduplication actually work for DispatchChangesJob (T291118)]] (duration: 00m 55s)
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P14465 and previous config saved to /var/cache/conftool/dbconfig/20210224-122043-root.json
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/Hooks/RecentChangeSaveHookHandler.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (2/2) (duration: 00m 56s)
* 12:18 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (duration: 00m 56s)
* 12:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:47 moritzm: copied wmf-certificates from buster-wikimedia to stretch-wikimedia in reprepro
* 12:06 hnowlan: restarting mtail on A:mw-api or A:parsoid or A:mw-jobrunner or A:mw
* 10:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:731237{{!}}Don't filter by change Id when dispatching to client wikis ()]] (duration: 00m 59s)
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P14464 and previous config saved to /var/cache/conftool/dbconfig/20210224-120538-root.json
* 09:48 moritzm: installing node-tar security updates on buster
* 11:54 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:39 vgutierrez: updating acme-chief to version 0.34 on acmechief instances - [[phab:T292619|T292619]]
* 11:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:38 godog: sync metrics from graphite1004 to graphite2003 - [[phab:T247963|T247963]]
* 11:51 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 09:13 moritzm: installing apr security updates on bullseye
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P14463 and previous config saved to /var/cache/conftool/dbconfig/20210224-115034-root.json
* 08:57 godog: cleanup graphite metrics not modified for >= ~3yr (1024 days)
* 11:45 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 07:34 ema: cp3060 (text), cp3061 (upload): upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 11:44 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 07:34 elukey: depool + restart blazegraph on wdqs1013
* 11:42 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:39 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 06:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P14462 and previous config saved to /var/cache/conftool/dbconfig/20210224-113531-root.json
* 06:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:33 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 11:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:23 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:22 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 11:22 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P14461 and previous config saved to /var/cache/conftool/dbconfig/20210224-112027-root.json
* 11:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 11:15 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:14 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P14460 and previous config saved to /var/cache/conftool/dbconfig/20210224-111301-marostegui.json
* 11:12 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 11:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: serer issue
* 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: serer issue
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P14459 and previous config saved to /var/cache/conftool/dbconfig/20210224-105204-root.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P14458 and previous config saved to /var/cache/conftool/dbconfig/20210224-103700-root.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P14457 and previous config saved to /var/cache/conftool/dbconfig/20210224-102157-root.json
* 10:20 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:19 moritzm: installing gnutls28 bugfix updates from Buster 10.8 point release
* 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 10:10 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P14456 and previous config saved to /var/cache/conftool/dbconfig/20210224-100653-root.json
* 10:04 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:02 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 09:56 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P14455 and previous config saved to /var/cache/conftool/dbconfig/20210224-095150-root.json
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P14454 and previous config saved to /var/cache/conftool/dbconfig/20210224-094523-marostegui.json
* 09:34 marostegui: Update pc2007, pc2010, db2071
* 09:31 marostegui: Update db1077
* 09:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1033.eqiad.wmnet
* 09:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1033.eqiad.wmnet
* 09:19 effie: upgrade memcached on mc1033, mc2033
* 09:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1002.wikimedia.org with reason: REIMAGE
* 09:06 volans: run "sudo find . -user root -exec chown netbox. '<nowiki>{</nowiki><nowiki>}</nowiki>' \;" in /srv/deployment/netbox/deploy-cache/revs on netbox* hosts to prevent scap failures on cleanup - [[phab:T265084|T265084]]
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1002.wikimedia.org with reason: REIMAGE
* 09:01 elukey: roll restart druid brokers on druid public
* 08:58 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 08:53 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 08:52 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:52 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 08:50 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 08:50 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 08:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:35 moritzm: reimaging bast1002 to Buster
* 08:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:30 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:26 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 08:04 jynus: restarting db2101, db2139, db2141 [[phab:T271913|T271913]]
* 07:56 moritzm: installing remaining openldap updates for buster
* 06:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1090.eqiad.wmnet
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1090.eqiad.wmnet
* 04:10 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] Running `/srv/deployment/wdqs/wdqs/loadData.sh -n wdq -d /srv/wdqs/munged/ -s 864` on `ryankemper@wdqs2008` tmux session `data_reload`
* 04:04 ryankemper: [WDQS] Depooled `wdqs2008`
* 03:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2149.codfw.wmnet with reason: REIMAGE
* 03:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: REIMAGE
* 03:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2148.codfw.wmnet with reason: REIMAGE
* 03:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2148.codfw.wmnet with reason: REIMAGE
* 02:58 ryankemper: [WDQS Data Reload] Restarting reload on test node `wdqs1009` from where it last left off: `/srv/deployment/wdqs/wdqs/loadData.sh -n wdq -d /srv/wdqs/munged/ -s 947`
* 02:57 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 02:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2147.codfw.wmnet with reason: REIMAGE
* 02:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: REIMAGE
* 02:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE
* 02:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE
* 02:30 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 02:29 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 02:29 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 02:27 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 06m 24s)
* 02:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec (duration: 01m 37s)
* 02:22 gehel@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 02:22 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec
* 02:20 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64
* 02:18 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 11m 22s)
* 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE
* 02:07 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.64` on canary `wdqs1003`; proceeding to rest of fleet
* 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE
* 02:06 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64
* 02:06 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.64`. Pre-deploy tests passing on canary `wdqs1003`
* 00:58 volker-e@deploy1001: Finished deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: {{Gerrit|a66b5b6}} “Components”: Add “Dialogs” (#430) (duration: 00m 06s)
* 00:58 volker-e@deploy1001: Started deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: {{Gerrit|a66b5b6}} “Components”: Add “Dialogs” (#430)
* 00:47 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error (duration: 01m 37s)
* 00:45 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error
* 00:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE
* 00:02 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE


== 2021-02-23 ==
== 2021-10-16 ==
* 22:52 chaomodus: Netbox 2.10 upgrade complete [[phab:T265084|T265084]]
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:28 crusnov@deploy1001: Finished deploy [netbox/deploy@dabbf5e]: Deploying Netbox 2.10.4-wmf to production [[phab:T265084|T265084]] (duration: 06m 11s)
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:25 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:25 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:22 crusnov@deploy1001: Started deploy [netbox/deploy@dabbf5e]: Deploying Netbox 2.10.4-wmf to production [[phab:T265084|T265084]]
* 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 22:17 chaomodus: deploying Netbox 2.10 to production and associated work
* 21:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix typos in wgEventLoggingSchemas (duration: 01m 05s)
* 21:38 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.32  refs [[phab:T274936|T274936]]
* 21:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1344853]: apply spark env_vars to executors too (duration: 01m 46s)
* 21:34 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1344853]: apply spark env_vars to executors too
* 21:28 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.32  refs [[phab:T274936|T274936]] (duration: 36m 52s)
* 21:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 21:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 21:00 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@46a8ae1]: ores_bulk_ingest: namespace is not plural (duration: 01m 41s)
* 21:00 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 21:00 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 20:58 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@46a8ae1]: ores_bulk_ingest: namespace is not plural
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab1002.eqiad.wmnet
* 20:52 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.32  refs [[phab:T274936|T274936]]
* 20:44 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op: math enable talking to mathoid directly in labs, [[phab:T274436|T274436]] (duration: 00m 57s)
* 20:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix typo in visualeditortemplatedialoguse - [[phab:T275015|T275015]] (duration: 01m 01s)
* 20:13 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo cluster: Reboot kafka nodes - razzi@cumin1001
* 20:04 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab1002.eqiad.wmnet
* 19:54 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:43 ryankemper: [WDQS Deploy] Disk space low on `wdqs1009`, rolling back so that can be addressed
* 19:43 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 08m 01s)
* 19:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Declare WMDE Technical Wishes streams and migrate to EventGate on testwiki (duration: 02m 41s)
* 19:36 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.64` on canary `wdqs1003`; proceeding to rest of fleet
* 19:35 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64
* 19:35 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.64`. Pre-deploy tests passing on canary `wdqs1003`
* 19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab1001.eqiad.wmnet
* 19:32 legoktm: re-enabling puppet on registry*
* 19:30 legoktm: pushed new wikimedia-buster image
* 19:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3969cae]: new dag ores_bulk_ingest (duration: 01m 32s)
* 19:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3969cae]: new dag ores_bulk_ingest
* 19:10 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:08 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:08 legoktm: disabling puppet on registry* except registry2001 while rolling out https://gerrit.wikimedia.org/r/664683
* 19:04 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab1001.eqiad.wmnet
* 18:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c2190da]: environment and venv builder for ores_bulk_ingest (duration: 01m 40s)
* 18:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c2190da]: environment and venv builder for ores_bulk_ingest
* 18:15 ebernhardson@deploy1001: deploy aborted: environment and venv builder for ores_bulk_ingest (duration: 00m 16s)
* 18:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c2190da]: environment and venv builder for ores_bulk_ingest
* 18:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:29 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:29 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:22 longma: wmf/1.36.0-wmf.32 was branched at {{Gerrit|03c382f199318f4ecd6a92c0acc280b6543adcc3}} for [[phab:T274936|T274936]]
* 17:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1034.eqiad.wmnet
* 17:18 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:18 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1034.eqiad.wmnet
* 17:16 effie: upgrade memcached on mc1034, mc2034 - [[phab:T270315|T270315]]
* 17:01 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:01 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:55 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:55 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:48 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Enable session tick instrument on all wikis ([[phab:T274172|T274172]]) (duration: 00m 58s)
* 16:46 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:46 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:42 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:42 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:25 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo cluster: Reboot kafka nodes - razzi@cumin1001
* 16:02 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Declare TranslationRecommendation event streams - [[phab:T271163|T271163]] (duration: 00m 58s)
* 15:52 jynus: previous message should say 15:38 [[phab:T267338|T267338]]
* 15:51 jynus: started swift codfw backup stress test at 14:38 with 10 threads [[phab:T267338|T267338]]
* 15:44 elukey: reboot an-launcher1002 for kernel updates
* 15:35 moritzm: restarting PHP/Apache on mw canaries for gnutls update
* 15:23 moritzm: installing gnutls28 bugfix updates from Buster 10.8 point release
* 15:17 elukey: deploy a new term to the analytics-in4 filter on cr1/cr2-eqiad (see https://gerrit.wikimedia.org/r/c/operations/homer/public/+/665814)
* 14:55 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wgEventLoggingSchemas overrides for QuickSurvey and NavigationTiming (duration: 00m 56s)
* 14:51 elukey: drop /srv/backup-1007 on stat1008 to free space
* 14:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SpecialMuteSubmit to EventGate on all wikis - [[phab:T268517|T268517]] (duration: 00m 58s)
* 14:40 otto@deploy1001: sync-file aborted: Migrate SpecialMuteSubmit to EventGate on all wikis - [[phab:T268517|T268517]] (duration: 00m 05s)
* 14:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE
* 14:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE
* 14:07 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 14:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 14:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 14:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:02 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:00 moritzm: restarting PHP/Apache on mw canaries for openldap update
* 13:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:54 moritzm: installing openldap security updates on buster (just client-side tools/libs, all slapd instance already fixed)
* 13:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 13:49 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:54 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 12:54 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 12:48 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 12:48 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 12:44 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 12:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/ContentTranslation/app/: {{Gerrit|ee77c4ac5b7e5961751734ea17845cf2172bd889}}: bump ContentTranslation ([[phab:T275385|T275385]]) (duration: 00m 59s)
* 12:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:36 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:35 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:34 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 12:32 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:32 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:31 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8b7ca4c8049a11ed6221fa579b426b55a53e4fd9}}: thwikisource: Add NS 102 and NS 114 as content namespace ([[phab:T275282|T275282]]) (duration: 00m 56s)
* 12:30 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 12:29 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 12:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:26 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:19 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 12:17 jayme: running puppet on deploy1001
* 12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:655428{{!}}Add sources to specialSiteLinkGroups Wikibase setting]] ([[phab:T138332|T138332]]) (duration: 01m 00s)
* 11:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1035.eqiad.wmnet
* 11:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1035.eqiad.wmnet
* 11:18 effie: upgrade memcached on mc1035, mc2035 - [[phab:T270315|T270315]]
* 10:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbmonitor2001.wikimedia.org
* 09:58 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbmonitor2001.wikimedia.org
* 09:45 vgutierrez: reload nginx on cloudelastic100[56]
* 09:44 moritzm: installing screen security updates on stretch
* 09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes [[phab:T266913|T266913]]
* 09:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes [[phab:T266913|T266913]]
* 09:35 moritzm: installing bind security updates on buster (client-side tools/libs)
* 09:10 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:10 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:06 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 08:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1001.eqiad.wmnet
* 08:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
* 08:40 Urbanecm: [urbanecm@mwmaint1002 ~/altwiki]$ mwscript namespaceDupes.php altwiki --fix
* 08:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9f434e2966393f7911d04b5bf77e02eb11bb16ab}}: Add ВП as an alias for NS_PROJECT in altwiki ([[phab:T271980|T271980]]) (duration: 00m 59s)
* 08:39 Urbanecm: Run mwscript updateSpecialPages.php --wiki=altwiki
* 08:02 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 07:56 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 07:56 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 07:13 hashar: Restarting CI Jenkins for plugin upgrade # [[phab:T271683|T271683]]
* 05:13 krinkle@deploy1001: Finished deploy [integration/docroot@44d5685]: {{Gerrit|I307e8f4f6979}} (duration: 00m 06s)
* 05:13 krinkle@deploy1001: Started deploy [integration/docroot@44d5685]: {{Gerrit|I307e8f4f6979}}
* 00:46 eileen: civicrm revision changed from {{Gerrit|c535ac603a}} to {{Gerrit|5e042e6e57}}, config revision is {{Gerrit|ef64f705bb}}


== 2021-02-22 ==
== 2021-10-15 ==
* 23:59 mutante: logstash2031 - systemctl reset-failed
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:53 mutante: stat1007 - same problem and alerts as stat1004
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:52 mutante: stat1004 - systemctl reset-failed to clear icinga alerts for systemd state caused by jupyterhub singleuser services
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:47 dpifke@deploy1001: Finished deploy [performance/arc-lamp@1f3bce1]: Revert https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/664600 (duration: 00m 05s)
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 23:47 dpifke@deploy1001: Started deploy [performance/arc-lamp@1f3bce1]: Revert https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/664600
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet
* 22:34 mutante: apt2001 - upgraded nginx
* 23:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1286.eqiad.wmnet
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:34 milimetric@deploy1001: Finished deploy [analytics/refinery@3de01b5] (thin): Fix camus (duration: 00m 07s)
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:34 milimetric@deploy1001: Started deploy [analytics/refinery@3de01b5] (thin): Fix camus
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 23:33 milimetric@deploy1001: Finished deploy [analytics/refinery@3de01b5]: Fix camus (duration: 14m 03s)
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 23:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:19 milimetric@deploy1001: Started deploy [analytics/refinery@3de01b5]: Fix camus
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:18 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:18 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:09 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 23:09 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1410.eqiad.wmnet
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1412.eqiad.wmnet
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1412.eqiad.wmnet
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1410.eqiad.wmnet
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1286.eqiad.wmnet with reason: REIMAGE
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:50 legoktm: disabling puppet on mwdebug1001 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/664903
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1286.eqiad.wmnet with reason: REIMAGE
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:42 krinkle@deploy1001: Synchronized w/fatal-error.php: {{Gerrit|df694d695}} (duration: 00m 56s)
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:31 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:31 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1314.eqiad.wmnet
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 21:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1314.eqiad.wmnet
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1279.eqiad.wmnet
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 21:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1312.eqiad.wmnet
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:00 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T273463|T273463]] [[phab:T271985|T271985]] [[phab:T273468|T273468]])
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:59 sbassett: Deployed security patch for [[phab:T274883|T274883]]
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 20:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1279.eqiad.wmnet with reason: REIMAGE
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1279.eqiad.wmnet with reason: REIMAGE
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1312.eqiad.wmnet with reason: REIMAGE
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1312.eqiad.wmnet with reason: REIMAGE
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1314.eqiad.wmnet with reason: REIMAGE
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1314.eqiad.wmnet with reason: REIMAGE
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:39 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T273463|T273463]] [[phab:T271985|T271985]] [[phab:T273468|T273468]])
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:29 mutante: mw1279 (canary) - reimaging to buster
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 20:29 mutante: mw1279 (canary) - reimaging to stretch
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
* 06:20 urbanecm: Start server-side upload for 1 video file
* 20:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1349.eqiad.wmnet
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 20:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1316.eqiad.wmnet
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1316.eqiad.wmnet
* 00:07 brennen: end of UTC late backport & config training window
* 20:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1315.eqiad.wmnet
* 20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1315.eqiad.wmnet
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: REIMAGE
* 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: REIMAGE
* 19:36 urbanecm@deploy1001: Synchronized wmf-config/config/rowiki.yaml: {{Gerrit|fc7b071b98b2c14d45259212bd6bea858e3f5aa7}}: Enable GrowthExperiments on rowiki ([[phab:T275130|T275130]]; 3/3) (duration: 00m 55s)
* 19:35 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: {{Gerrit|fc7b071b98b2c14d45259212bd6bea858e3f5aa7}}: Enable GrowthExperiments on rowiki ([[phab:T275130|T275130]]; 2/3) (duration: 00m 55s)
* 19:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fc7b071b98b2c14d45259212bd6bea858e3f5aa7}}: Enable GrowthExperiments on rowiki ([[phab:T275130|T275130]]; 1/3) (duration: 00m 55s)
* 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1315.eqiad.wmnet with reason: REIMAGE
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1315.eqiad.wmnet with reason: REIMAGE
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1316.eqiad.wmnet with reason: REIMAGE
* 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1316.eqiad.wmnet with reason: REIMAGE
* 19:08 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: {{Gerrit|902b6854b5d56fde9fbf5d2c779282049bf7288a}}: Enable GrowthExperiments on thwiki ([[phab:T274646|T274646]]) (duration: 00m 54s)
* 19:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|902b6854b5d56fde9fbf5d2c779282049bf7288a}}: Enable GrowthExperiments on thwiki ([[phab:T274646|T274646]]) (duration: 00m 56s)
* 17:18 ppchelko@deploy1001: Finished deploy [restbase/deploy@c5c4b2d] (dev-cluster): remove graphoid (duration: 03m 09s)
* 17:15 ppchelko@deploy1001: Started deploy [restbase/deploy@c5c4b2d] (dev-cluster): remove graphoid
* 16:51 Urbanecm: Run scap pull on mwmaint1002 to clear any local changes
* 16:50 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 22s)
* 16:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating mniwiktionary ([[phab:T273457|T273457]]) (duration: 00m 56s)
* 16:46 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating mniwiktionary ([[phab:T273457|T273457]])
* 16:45 urbanecm@deploy1001: Synchronized dblists: Creating mniwiktionary ([[phab:T273457|T273457]]) (duration: 00m 56s)
* 16:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:44 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating mniwiktionary ([[phab:T273457|T273457]]) (duration: 00m 56s)
* 16:42 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating mniwiktionary ([[phab:T273457|T273457]]) (duration: 00m 55s)
* 16:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:36 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:26 dpifke@deploy1001: Finished deploy [performance/arc-lamp@1f3bce1]: Deploy ArcLamp fixes for [[phab:T273565|T273565]] and [[phab:T273640|T273640]] (duration: 00m 05s)
* 16:26 dpifke@deploy1001: Started deploy [performance/arc-lamp@1f3bce1]: Deploy ArcLamp fixes for [[phab:T273565|T273565]] and [[phab:T273640|T273640]]
* 16:19 urbanecm@deploy1001: Synchronized langlist: Creating mniwiki ([[phab:T273456|T273456]]) (duration: 00m 54s)
* 16:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating mniwiki ([[phab:T273456|T273456]]) (duration: 00m 56s)
* 16:17 urbanecm@deploy1001: Synchronized wmf-config/logos.php: Creating mniwiki ([[phab:T273456|T273456]]) (duration: 00m 56s)
* 16:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating mniwiki ([[phab:T273456|T273456]]) (duration: 00m 55s)
* 16:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating mniwiki ([[phab:T273456|T273456]])
* 16:13 urbanecm@deploy1001: Synchronized dblists: Creating mniwiki ([[phab:T273456|T273456]]) (duration: 00m 57s)
* 16:12 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating mniwiki ([[phab:T273456|T273456]]) (duration: 00m 55s)
* 16:11 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating mniwiki ([[phab:T273456|T273456]]) (duration: 00m 56s)
* 16:08 urbanecm@deploy1001: Synchronized langlist: Creating altwiki ([[phab:T271980|T271980]]) (duration: 00m 55s)
* 16:03 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating altwiki ([[phab:T271980|T271980]]) (duration: 00m 55s)
* 16:02 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating altwiki ([[phab:T271980|T271980]])
* 16:00 urbanecm@deploy1001: Synchronized dblists: Creating altwiki ([[phab:T271980|T271980]]) (duration: 00m 54s)
* 15:59 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating altwiki ([[phab:T271980|T271980]]) (duration: 00m 59s)
* 15:57 Urbanecm: Temporarily replace /srv/mediawiki/php-1.36.0-wmf.31/extensions/WikimediaMaintenance/addWiki.php with /home/urbanecm/addWiki.php at mwmaint1002 to unbreak addWiki.php
* 15:53 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:43 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating altwiki ([[phab:T271980|T271980]]) (duration: 00m 56s)
* 15:32 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:16 herron: roll restarting kafkamon hosts for updates
* 13:57 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 13:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4001.ulsfo.wmnet
* 13:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/ContentTranslation/app/: {{Gerrit|f9e823e}}: CX3 Build 0.1.0+{{Gerrit|20210216}} (fixes missing bits in [[phab:T271397|T271397]]) (duration: 00m 55s)
* 13:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3001.esams.wmnet
* 13:37 moritzm: installing openldap security updates on corp replicas
* 13:36 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/FlaggedRevs/extension.json: {{Gerrit|a4cd98e7a581fe18634da05ba04eaf8035023c26}}: Grant sysops review and unreviewed pages right by default (apparently i forgot to rebase the first time, resync; [[phab:T275293|T275293]]) (duration: 00m 57s)
* 13:32 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus4001.ulsfo.wmnet
* 13:31 godog: reset-failed ifup@ens14 on prometheus3001 - [[phab:T273026|T273026]]
* 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
* 13:29 akosiaris: repool sessionstore in eqiad after sessionstore certificate refresh. [[phab:T274564|T274564]]
* 13:29 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 13:27 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3001.esams.wmnet
* 13:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
* 13:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
* 13:16 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
* 13:16 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 27 hosts with reason: Restarting cloudcanary instances
* 13:16 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 27 hosts with reason: Restarting cloudcanary instances
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14439 and previous config saved to /var/cache/conftool/dbconfig/20210222-131153-root.json
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14438 and previous config saved to /var/cache/conftool/dbconfig/20210222-125650-root.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14437 and previous config saved to /var/cache/conftool/dbconfig/20210222-124146-root.json
* 12:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
* 12:28 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14436 and previous config saved to /var/cache/conftool/dbconfig/20210222-122643-root.json
* 12:24 urbanecm@deploy1001: Synchronized wmf-config//throttle.php: {{Gerrit|d806f3a986244f8027aba730e72d99babe3b37e9}}: Add a throttle rule for for edit-a-thon ([[phab:T275237|T275237]]) (duration: 00m 54s)
* 12:22 akosiaris: depool sessionstore in eqiad for sessionstore certificate refresh. [[phab:T274564|T274564]]
* 12:21 akosiaris: repool sessionstore in codfw after sessionstore certificate refresh. [[phab:T274564|T274564]]
* 12:21 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=sessionstore
* 12:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/FlaggedRevs/extension.json: {{Gerrit|a4cd98e7a581fe18634da05ba04eaf8035023c26}}: Grant sysops review and unreviewed pages right by default ([[phab:T275293|T275293]]) (duration: 00m 55s)
* 12:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7bd26dc6160a5bc3ba9235ce93c01e7ab9744487}}: Add inaturalist-open-data.s3.amazonaws.com to copyupload list ([[phab:T275318|T275318]]) (duration: 00m 56s)
* 12:15 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|391900b8db9ffdee8565d82c38c089843876a27b}}: ukwikivoyage: Enable block AbuseFilter action ([[phab:T275271|T275271]]) (duration: 00m 55s)
* 12:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a1f8ce48249ad457d79c57e27836ee492eb00427}}: Enable Section Translation on Bengali Wikipedia ([[phab:T271397|T271397]]) (duration: 00m 56s)
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14435 and previous config saved to /var/cache/conftool/dbconfig/20210222-121139-root.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P14434 and previous config saved to /var/cache/conftool/dbconfig/20210222-120717-marostegui.json
* 12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4775fb63e79501c3dba7ae4b9c3b1172d92dc0d0}}: Adjust CX MT threshold to 90 for Vietnamese Wikipedia ([[phab:T275121|T275121]]) (duration: 00m 57s)
* 12:02 moritzm: installing openldap security updates on serpens/seaborgium
* 11:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1036.eqiad.wmnet
* 11:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1036.eqiad.wmnet
* 11:53 effie: upgrading memecached to 1.6 on mc1036
* 11:50 volans: upgrading python3-wmflib fleet wide to 0.0.7-1+deb10u1
* 11:27 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 27 hosts with reason: Restarting cloudcanary instances
* 11:27 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 27 hosts with reason: Restarting cloudcanary instances
* 11:26 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
* 11:26 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
* 11:26 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
* 11:26 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
* 11:22 godog: roll restart prometheus on cloudmetrics*
* 11:21 godog: roll restart prometheus on prometheus*
* 11:12 godog: restart prometheus on prometheus2004 to apply changes - [[phab:T273278|T273278]]
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14433 and previous config saved to /var/cache/conftool/dbconfig/20210222-111032-root.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14432 and previous config saved to /var/cache/conftool/dbconfig/20210222-105528-root.json
* 10:49 _joe_: removing stray old builds from compiler1003
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14431 and previous config saved to /var/cache/conftool/dbconfig/20210222-104025-root.json
* 10:36 _joe_: manually removed the restbase-http ipvs entry from the load balancers
* 10:30 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=sessionstore
* 10:29 akosiaris: depool sessionstore in codfw for sessionstore certificate refresh. [[phab:T274564|T274564]]
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14430 and previous config saved to /var/cache/conftool/dbconfig/20210222-102521-root.json
* 10:16 _joe_: restarting pybal on lvs1015 to pick up restbase http removal
* 10:12 _joe_: restarting pybal on lvs1016 to pick up restbase http removal
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14429 and previous config saved to /var/cache/conftool/dbconfig/20210222-101018-root.json
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P14428 and previous config saved to /var/cache/conftool/dbconfig/20210222-100653-marostegui.json
* 09:51 _joe_: restarting low-traffic pybals in codfw to remove the restbase http endpoint
* 09:35 marostegui: Deploy schema change on s3 codfw master, there will be lag on s3 codfw - [[phab:T273359|T273359]]
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
* 09:20 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
* 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
* 09:04 moritzm: installing screen security updates on Buster
* 09:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
* 08:40 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 08:39 gehel: depool elastic2045 and ban from clsuters - [[phab:T275345|T275345]]
* 08:12 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|cea41a2f7736aa29dee8f10de4c0c17353ece963}}: fiwiki: Assign stablesettings to reviewers in IS.php rather than FR-specific file ([[phab:T275017|T275017]]; 2/2) (duration: 00m 55s)
* 08:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cea41a2f7736aa29dee8f10de4c0c17353ece963}}: fiwiki: Assign stablesettings to reviewers in IS.php rather than FR-specific file ([[phab:T275017|T275017]]; 1/2) (duration: 01m 08s)
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1090* from dbctl [[phab:T274333|T274333]]', diff saved to https://phabricator.wikimedia.org/P14426 and previous config saved to /var/cache/conftool/dbconfig/20210222-075437-marostegui.json
* 07:38 moritzm: installing openldap security updates on LDAP replicas
* 07:29 hashar: Restarting CI Jenkins to downgrade plugin # [[phab:T271683|T271683]]
* 07:14 hashar: Restarting CI Jenkins for plugin upgrade # [[phab:T271683|T271683]]
* 07:11 elukey: powercycle elastic2045 - com2 available, no ssh, no root login (hangs indefinitely), no prometheus metrics reported


== 2021-02-21 ==
== 2021-10-14 ==
* 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 - crashed', diff saved to https://phabricator.wikimedia.org/P14424 and previous config saved to /var/cache/conftool/dbconfig/20210221-160258-marostegui.json
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 10:07 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 10:05 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 09:32 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 09:30 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 09:29 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 09:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 22:31 mutante: depooling mw1452 for testig
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 18:41 urbanecm: UTC evening B&C done
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 17:42 rzl: depool mw1452 for training
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:33 moritzm: installing node-ansi-regex security updates
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
* 14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
* 14:23 moritzm: installing krb5 security updates on KDCs
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 2/2) (duration: 01m 03s)
* 14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 1/2) (duration: 01m 04s)
* 14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|82d0a4bf45126ecba2cfcd1a0c2081a00f58dca3}}: Enable VE by default on 4 more wikis ([[phab:T290614|T290614]]) (duration: 01m 05s)
* 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) [[phab:T275784|T275784]]
* 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730746{{!}}Untangle “dispatch via jobs” settings in Wikibase.php (T291828)]] (no-op) (duration: 01m 04s)
* 12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730725{{!}}Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828)]] (no-op) (duration: 01m 05s)
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: {{Gerrit|1f33fc3}}, {{Gerrit|e0ea1b8}}, {{Gerrit|cba2ac9}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 05s)
* 10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|465b564}}, {{Gerrit|a8cc98b}}, {{Gerrit|6e95c48}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 06s)
* 10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
* 09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
* 09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
* 09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
* 09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:50 foks: changing user email for "Region of Peel Archives"
* 01:41 ejegg: updated payments-wiki from {{Gerrit|b329d2dea2}} to {{Gerrit|19d18c1852}}
* 01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .


== 2021-02-20 ==
== 2021-10-13 ==
* 00:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet
* 23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 00:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
* 23:36 eileen: civicrm revision changed from {{Gerrit|946dfb6c5a}} to {{Gerrit|018d3b19fe}}, config revision is {{Gerrit|85277466ed}}
* 00:15 ebernhardson: start batch processing images through MachineVision fetchSuggestions.php for [[phab:T274220|T274220]] on mwmaint1002
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730575{{!}}Create an alias for the project namespace on kswiki (T291740)]] (duration: 01m 05s)
* 00:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1333.eqiad.wmnet
* 22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 00:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1333.eqiad.wmnet
* 22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: [[gerrit:730578{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 00:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1339.eqiad.wmnet
* 21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: [[gerrit:730577{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 00:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1342.eqiad.wmnet
* 21:47 foks: removing 8 files for legal compliance
* 00:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1342.eqiad.wmnet
* 21:03 foks: removing 2 files for legal compliance
* 21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: [[gerrit:730574{{!}}Fall back to main page if given title is invalid (T293299)]] (duration: 01m 04s)
* 20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( [[phab:T285867|T285867]])
* 19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
* 19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 01m 03s)
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87879865c35edab3ead523027681146e00d6fc02}}: Create Translation namespace for viwikisource ([[phab:T290691|T290691]]) (duration: 01m 04s)
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06fd0f225575448771cdba0d4e6bf36bb6715bc1}}: add extendedconfimed for autoreview group on ptwiki ([[phab:T292912|T292912]]) (duration: 01m 04s)
* 18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
* 18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bb2b388217aa91a39ed3684f87fdf7edb06fd81}}:  Set autoconfirmedextended and confirmedextended for ptwiki ([[phab:T292915|T292915]]) (duration: 01m 04s)
* 18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|694bc234ab5dbb9a2387a6129998d45a53ac0ab3}}: Remove an old dawiki temporary logo (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|224e2a374b1cc6327e9d8c2bca576091ce4efc74}}: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki ([[phab:T291630|T291630]]) (duration: 01m 05s)
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|1b96f54a518620b0dc6a0ab63b402d0ea2c6bf70}}: Update logo for liwiktionary ([[phab:T291479|T291479]]) (duration: 01m 14s)
* 18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|dd7a3314602ffddc5b917cccc71c917301639388}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 04s)
* 17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|5c27154cf434bebc37f5e98e2ad1b5cea7cde1d4}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 15s)
* 16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
* 16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
* 16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:52 ema: repool cp4021, further testing can be performed on sretest1001 [[phab:T201317|T201317]]
* 14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
* 14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:48 moritzm: reverted to clean package state on deneb
* 14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
* 14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
* 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:59 XioNoX: push prep-work for anycast tuning in ulsfo - [[phab:T288843|T288843]]
* 13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
* 13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 12:13 Lucas_WMDE: UTC morning backport+config window done
* 12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: [[gerrit:730370{{!}}Add Link: Do not log "no suggestion found" errors in production log (T291251)]] (duration: 01m 04s)
* 12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]'  # after applying 730512 at mwmaint1002 to workaround [[phab:T293219|T293219]] # [[phab:T255037|T255037]]
* 12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: [[gerrit:730371{{!}}Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536)]] (duration: 01m 07s)
* 11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]) (duration: 01m 04s)
* 11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason '[[:phab:T293184]]' # [[phab:T293184|T293184]]
* 11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 2/3) (duration: 01m 04s)
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 1/3) (duration: 01m 05s)
* 11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # [[phab:T255037|T255037]]
* 11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # [[phab:T255037|T255037]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:730380{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 07s)
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: [[gerrit:730385{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 18s)
* 11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 11:19 ema: pool cp4021 after reimage [[phab:T201317|T201317]]
* 11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:728490{{!}}Add more types of QuickSurveys on beta cluster (T292459)]] (duration: 01m 53s)
* 10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - [[phab:T288825|T288825]]
* 08:15 godog: bounce graphite on graphite1004 to apply new config
* 07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
* 07:13 XioNoX: provision new eqsin-ulsfo link - [[phab:T273308|T273308]]
* 06:26 elukey: `kafka topics --alter --topic <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - [[phab:T288825|T288825]]
* 00:38 ejegg: updated payments-wiki from {{Gerrit|030b11da1a}} to {{Gerrit|b329d2dea2}}


== 2021-02-19 ==
== 2021-10-12 ==
* 23:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1339.eqiad.wmnet
* 23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1317.eqiad.wmnet with reason: REIMAGE
* 23:16 urbanecm: UTC late B&C window done
* 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1317.eqiad.wmnet with reason: REIMAGE
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 01m 04s)
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1342.eqiad.wmnet with reason: REIMAGE
* 23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 02m 09s)
* 22:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1342.eqiad.wmnet with reason: REIMAGE
* 23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1333.eqiad.wmnet with reason: REIMAGE
* 22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
* 22:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1333.eqiad.wmnet with reason: REIMAGE
* 22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
* 20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1339.eqiad.wmnet with reason: REIMAGE
* 19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
* 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1339.eqiad.wmnet with reason: REIMAGE
* 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1340.eqiad.wmnet
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1320.eqiad.wmnet
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 22:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1320.eqiad.wmnet
* 18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 45m 36s)
* 22:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1262.eqiad.wmnet
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
* 18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 22:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
* 17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s)
* 21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 21:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1340.eqiad.wmnet with reason: REIMAGE
* 17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1340.eqiad.wmnet with reason: REIMAGE
* 17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1320.eqiad.wmnet with reason: REIMAGE
* 17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1262.eqiad.wmnet with reason: REIMAGE
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1320.eqiad.wmnet with reason: REIMAGE
* 17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233{{!}}Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s)
* 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2336.codfw.wmnet with reason: REIMAGE
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1262.eqiad.wmnet with reason: REIMAGE
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2336.codfw.wmnet with reason: REIMAGE
* 17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 20:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1287.eqiad.wmnet
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:57 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1287.eqiad.wmnet
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2257.codfw.wmnet
* 17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 20:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2257.codfw.cwmnet
* 17:12 moritzm: installing rsync bugfix updates
* 20:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1270.eqiad.wmnet
* 17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
* 16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1270.eqiad.wmnet
* 16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
* 16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
* 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1287.eqiad.wmnet
* 16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:33 mutante: mw1261, mw1270 - scap pull
* 16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 20:33 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin 'mw1261*,mw1270*,mw1287*' 'depool'
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 mutante: mw1287 - scap pull
* 16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
* 20:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2257.codfw.wmnet
* 16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 20:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1270.eqiad.wmnet
* 16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
* 16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 20:15 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.29 (duration: 01m 42s)
* 16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226{{!}}Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s)
* 20:06 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.28 (duration: 01m 50s)
* 16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 20:04 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.27 (duration: 02m 12s)
* 16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
* 20:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.26 (duration: 02m 12s)
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.25 (duration: 04m 09s)
* 16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 19:48 marxarelli: 1.36.0-wmf.31 re-rolled to all wikis ([[phab:T271345|T271345]])
* 16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 19:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1287.eqiad.wmnet with reason: REIMAGE
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:22 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1129.eqiad.wmnet with reason: REIMAGE
* 16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s)
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1270.eqiad.wmnet with reason: REIMAGE
* 16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1287.eqiad.wmnet with reason: REIMAGE
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1261.eqiad.wmnet with reason: REIMAGE
* 15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s)
* 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1270.eqiad.wmnet with reason: REIMAGE
* 15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1261.eqiad.wmnet with reason: REIMAGE
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:11 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.31
* 15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:01 dduvall@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/Echo/includes/model/Event.php: backport: [[gerrit:665177{{!}}Echo::create: Convert UserIdentityValue to plain User (T275161)]] (duration: 01m 20s)
* 15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 18:52 marxarelli: fetching backport https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/665177 for sync prior to all wikis (re)deploy ([[phab:T275161|T275161]])
* 15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1367.eqiad.wmnet
* 15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 18:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1341.eqiad.wmnet
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
* 18:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1367.eqiad.wmnet
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
* 18:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2272.codfw.wmnet
* 15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1341.eqiad.wmnet
* 14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
* 18:30 mutante: mw1367 - powercycled - stuck in reboot
* 13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2272.codfw.wmnet
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 18:07 Urbanecm: Password reset for User:Kolyma ([[phab:T274737|T274737]])
* 13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 17:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1341.eqiad.wmnet with reason: REIMAGE
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1341.eqiad.wmnet with reason: REIMAGE
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2272.codfw.wmnet with reason: REIMAGE
* 13:14 godog: add 50G to prometheus/k8s in eqiad
* 17:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2272.codfw.wmnet with reason: REIMAGE
* 13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - [[phab:T288853|T288853]] (duration: 00m 56s)
* 17:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1367.eqiad.wmnet with reason: REIMAGE
* 13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 17:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1367.eqiad.wmnet with reason: REIMAGE
* 13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 16:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1141.eqiad.wmnet with reason: REIMAGE
* 13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
* 16:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1140.eqiad.wmnet with reason: REIMAGE
* 12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1141.eqiad.wmnet with reason: REIMAGE
* 12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 16:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1134.eqiad.wmnet with reason: REIMAGE
* 12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 16:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1140.eqiad.wmnet with reason: REIMAGE
* 12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 16:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1134.eqiad.wmnet with reason: REIMAGE
* 12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 14:29 mbsantos@deploy1001: Finished deploy [tilerator/deploy@937deb5]: (no justification provided) (duration: 00m 15s)
* 11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 14:28 mbsantos@deploy1001: Started deploy [tilerator/deploy@937deb5]: (no justification provided)
* 11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 14:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 14:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 13:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 11:34 urbanecm: UTC morning B&C window done
* 13:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|860ea0944d6dc1e6b5061eb84eec378eb5ac8441}}: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis ([[phab:T291630|T291630]]) (duration: 00m 57s)
* 13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:41 godog: reset-failed ifup@ens13 on prometheus5001 - [[phab:T273026|T273026]]
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5001.eqsin.wmnet
* 11:06 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|e77ae17efb34723598fc69e87109944384df442a}}: static.php: correctly report a bad request (duration: 00m 57s)
* 13:31 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
* 13:29 gehel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 13:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus5001.eqsin.wmnet
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 09:27 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop backup cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 09:16 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop backup cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes [[phab:T288106|T288106]]
* 08:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1001.eqiad.wmnet
* 10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 08:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1001.eqiad.wmnet
* 10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine [[phab:T288106|T288106]]
* 08:06 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1108.eqiad.wmnet
* 10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 07:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
* 10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 [[phab:T288106|T288106]]
* 02:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1133.eqiad.wmnet with reason: REIMAGE
* 10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
* 02:24 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1133.eqiad.wmnet with reason: REIMAGE
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
* 01:22 mutante: mwmaint2001 back on buster and back in scap dsh groups (if anything pops up you can revert 665175)
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
* 01:19 mutante: deleting my huge build from puppet-compiler that failed because it made the compiler instance run out of disk to run on *
* 08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
* 01:03 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.30/includes/ProtectionForm.php: {{Gerrit|d305308a5d46a3f86bf0b211e8a733c0a951ddc1}}: field descriptors in HTMLForm must have keys ([[phab:T275018|T275018]]; [[phab:T274980|T274980]]) (duration: 01m 08s)
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
* 01:02 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/includes/ProtectionForm.php: {{Gerrit|2487c253b090d93daf85adae8ceb9d255cbf4ff2}}: field descriptors in HTMLForm must have keys ([[phab:T275018|T275018]]; [[phab:T274980|T274980]]) (duration: 01m 10s)
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:54 mutante: mwmaint2001 - back from reimage - scap pull
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:26 urbanecm@deploy1001: Synchronized static/images/project-logos/wikimedia-cloud-services.svg: {{Gerrit|686acba2f31df0d454c6f1c506c042af50b5cce0}}: Restore logos on Vector (classic version) and use cloud icon for labs ([[phab:T274210|T274210]]) (duration: 01m 07s)
* 07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|17dc3aa}}, {{Gerrit|e0ca905}}, {{Gerrit|c0f4f4e}}: GrowthExperiments backports ([[phab:T292224|T292224]], [[phab:T290609|T290609]], [[phab:T290609|T290609]]) (duration: 00m 59s)
* 00:14 dpifke@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Deploying excimer-wall profiler pipeline [[phab:T253160|T253160]] (duration: 01m 03s)
* 07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - [[phab:T288825|T288825]]
* 00:12 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying excimer-wall profiler pipeline [[phab:T253160|T253160]] (duration: 01m 02s)
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 07:22 moritzm: installing RT security updates
* 04:43 eileen: civicrm revision changed from {{Gerrit|96090e4bd2}} to {{Gerrit|946dfb6c5a}}, config revision is {{Gerrit|85277466ed}}
* 03:56 kart_: cxserver: Remove Matxin Key from Production ([[phab:T292635|T292635]])
* 03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:11 eileen: civicrm revision changed from {{Gerrit|598b59b0ee}} to {{Gerrit|96090e4bd2}}, config revision is {{Gerrit|85277466ed}}


== 2021-02-18 ==
== 2021-10-11 ==
* 23:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2001.codfw.wmnet with reason: REIMAGE
* 21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2001.codfw.wmnet with reason: REIMAGE
* 20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 23:26 dancy@deploy1001: Synchronized wmf-config/: Syncing https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/634552 (duration: 01m 07s)
* 17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 23:22 dancy@deploy1001: Synchronized wmf-config/CommonSettings.php: Syncing https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/634551 (duration: 01m 08s)
* 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 23:15 dancy@deploy1001: Synchronized src/ServiceConfig.php: (no justification provided) (duration: 03m 21s)
* 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 23:11 mutante: mwmaint2001 - will be rebooted for OS upgrade - [[phab:T267607|T267607]]
* 15:31 jgleeson: smashpig updated from {{Gerrit|3607b16f83}} to {{Gerrit|dd3a81c7c2}}
* 23:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 23:04 mutante: mwmaint1002 - rsyncing data from mwmaint2001
* 14:36 Emperor: start restoring weight to ms-be2045 [[phab:T290881|T290881]]
* 22:30 mutante: mwmaint2001 - tar-gzipping a lot of old user home data I keep finding, partially museum worthy from several maintenance hosts ago, like places like /root/home-mwmaint1001/username/home-terbium/iron/ :p
* 13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 21:29 marxarelli: 1.36.0-wmf.31 rolled back due to [[phab:T275161|T275161]] and new logspam ([[phab:T271345|T271345]])
* 12:53 moritzm: install apache security updates on buster
* 21:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.36.0-wmf.31"
* 12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
* 20:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.31
* 12:45 ema: cp4027: upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 19:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f33f9f71b13d9b9276df88ef6384ec6028ee2e1d}}: Make DiscussionTools replytool available for everyone on gomwiktionary ([[phab:T258554|T258554]]) (duration: 01m 05s)
* 12:04 moritzm: install apache security updates on bullseye
* 19:25 mutante: mwmaint2001 - deleting 'home-terbium' from all home directories (yes, it's in Bacula if you really used that, hope you didn't, it's been years since terbium)
* 10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 19:25 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|da7b8123ecb373c1de1634ae867fb2f5fbee89ad}}: Enable DiscussionTools beta feature for newtopictool on arwiki, cswiki, huwiki ([[phab:T273145|T273145]]) (duration: 01m 12s)
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 19:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/DiscussionTools/: {{Gerrit|1cc29df}}: {{Gerrit|6b88aff}}: DiscussionTools backports ([[phab:T272666|T272666]]; [[phab:T274949|T274949]]) (duration: 01m 08s)
* 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 19:19 urbanecm@deploy1001: sync-file aborted: {{Gerrit|1cc29df}} DiscussionTools backports ([[phab:T272666|T272666]]; [[phab:T274949|T274949]]) (duration: 00m 00s)
* 09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 19:17 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/DiscussionTools/: {{Gerrit|9c6cdf5}}: {{Gerrit|97acef6}}: DiscussionTools backports ([[phab:T272666|T272666]]; [[phab:T274949|T274949]]) (duration: 01m 26s)
* 09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 19:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
* 09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - [[phab:T288825|T288825]]
* 19:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
* 09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 16:51 volans: uploaded python3-wmflib_0.0.7 to apt.wikimedia.org buster-wikimedia
* 09:01 godog: bounce swift-object-replicator on ms-be2036
* 16:23 shdubsh: restart ircecho on kraz -- deploying new metrics endpoint [[phab:T216611|T216611]]
* 08:52 godog: bounce statsite on graphite1004 to apply unit config changes
* 16:05 moritzm: installing libmaxminddb updates from buster 10.8 point release
* 08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 15:33 _joe_: rebuilding base images for stretch,buster
* 08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
* 15:30 moritzm: installing PHP 7.3 security updates on buster
* 08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release [[phab:T292844|T292844]]
* 15:06 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:38 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 14:35 moritzm: installing libzstd security updates on Buster
* 08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 13:59 moritzm: installing intel-microcode security updates on buster
* 08:25 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 13:49 jynus: restart db1150 [[phab:T271913|T271913]]
* 08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
* 12:20 jynus: restart db1140 [[phab:T271913|T271913]]
* 08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - [[phab:T292877|T292877]]
* 12:01 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/includes/HookContainer/DeprecatedHooks.php: {{Gerrit|28aa8718549b76c88e9757a273e0c602479b8d8b}}: Silent deprecate ProtectionForm::buildForm ([[phab:T274889|T274889]]) (duration: 01m 14s)
* 07:58 volans: migrating physical hosts DHCP to the new reimage process - [[phab:T269855|T269855]]
* 11:49 jynus: restart db1102 [[phab:T271913|T271913]]
* 07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - [[phab:T288825|T288825]]
* 11:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 (duration: 01m 09s)
* 11:04 marostegui: Upgrade and reboot pc1009
* 11:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 (duration: 01m 08s)
* 10:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|33ab68f3d54dcb411c47b03fa8e283fa3077ea85}}: Add https://seer.ufrgs.br to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T270962|T270962]]) (duration: 01m 09s)
* 10:45 urbanecm@deploy1001: Synchronized static/images: {{Gerrit|d1db3005144c1c6fc212bde49127ea13627857be}}: Revert "Temporarily add cswiki-black-ribbon.png as a static resource" (duration: 01m 09s)
* 10:42 jynus: restarting dbprov* hosts [[phab:T271913|T271913]]
* 10:34 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1001.eqiad.wmnet
* 10:30 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase calls to envoy (duration: 01m 15s)
* 10:27 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1001.eqiad.wmnet
* 09:48 jynus: restarting backup* hosts [[phab:T271913|T271913]]
* 09:46 elukey: upgrade presto to 0.246-wmf on an-coord1001, an-presto*, stat100x
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 [[phab:T274333|T274333]]', diff saved to https://phabricator.wikimedia.org/P14408 and previous config saved to /var/cache/conftool/dbconfig/20210218-084758-marostegui.json
* 08:31 marostegui: Upgrade kernel on db1154 and db1155 (sanitarium running buster hosts)
* 08:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE
* 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE
* 08:01 godog: upgrade grafana* to 7.4.2 - [[phab:T263747|T263747]]
* 07:59 marostegui: Reboot es2029, es2030, es2031, es2032, es2033, es2034 for kernel upgrade
* 07:32 marostegui: Reboot es2026, es2027, es2028 for kernel upgrade
* 06:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
* 06:54 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
* 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
* 06:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
* 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1075.eqiad.wmnet
* 06:10 marostegui: Reboot dbproxy1014 for kernel upgrade
* 01:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fe646957eb9b09377b07545ff194a726fd0cc6c7}}: hewikisource: Allow sysops to grant/revoke reviewer ([[phab:T274796|T274796]]) (duration: 01m 07s)
* 01:38 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:58 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:49 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/CentralNotice/resources/ext.centralNotice.display/state.js: {{Gerrit|dd64e44886727871fa0d2e0e87960d7d8ffba451}}: Remove optedOutCampaigns property from impression data ([[phab:T275054|T275054]]) (duration: 01m 08s)
* 00:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/CentralNotice/resources/ext.centralNotice.display/state.js: {{Gerrit|ff444c28eacbac45476b8fbaed82bc3d8fc4dc66}}: Remove optedOutCampaigns property from impression data ([[phab:T275054|T275054]]) (duration: 01m 09s)
* 00:31 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|08b32c453a1e879e6321ebec39122d0e06e14714}}: Remove wgCentralNoticeImpressionEventSampleRate; will default to 0 ([[phab:T275054|T275054]]) (duration: 02m 17s)
* 00:28 urbanecm@deploy1001: sync-file aborted: {{Gerrit|08b32c453a1e879e6321ebec39122d0e06e14714}}: Remove wgCentralNoticeImpressionEventSampleRate; will default to 0 ([[phab:T275054|T275054]]) (duration: 00m 00s)
* 00:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@8ca6884]: cirrus_namespace_map: Use retries when fetching (duration: 01m 21s)
* 00:02 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@8ca6884]: cirrus_namespace_map: Use retries when fetching


== 2021-02-17 ==
== 2021-10-09 ==
* 20:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1034.eqiad.wmnet with reason: REIMAGE
* 05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:29 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1034.eqiad.wmnet with reason: REIMAGE
* 04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 marxarelli: 1.36.0-wmf.31 rolled to group1. no new errors for wmf.31 ([[phab:T271345|T271345]])
* 01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 20:17 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.31 (duration: 01m 15s)
* 00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
* 20:15 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.31
* 00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 19:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2e521f76c195ab50ab28a7d4812a35ceac246907}}: hewikisource: Allow reviewers to rollback ([[phab:T274796|T274796]]) (duration: 01m 10s)
* 00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
* 19:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|88e6ebc5565a7a0b1431dd5f52c701d8df641990}}: hewikisource: Add bureaucrats the ability to grant/revoke (trans)import ([[phab:T274796|T274796]]) (duration: 01m 09s)
* 00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
* 19:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6c5c5f0d1b83a7f05272f133c269c740af8352db}}: arbcom_ruwiki: Add arbcom user group ([[phab:T274844|T274844]]) (duration: 01m 12s)
* 00:13 ryankemper: [[phab:T292814|T292814]] Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
* 19:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1033.eqiad.wmnet with reason: REIMAGE
* 00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 19:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1033.eqiad.wmnet with reason: REIMAGE
* 19:27 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=tlwikibooks --fix # [[phab:T274976|T274976]] # P14404
* 19:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c37fa0115113fb31cb54d9cf3f18a13f656c73dd}}: tlwikibooks: Add Wikijunior namespace ([[phab:T274976|T274976]]) (duration: 01m 09s)
* 19:24 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=tlwikibooks  --fix # [[phab:T274977|T274977]] # P14403
* 19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a7eb726f01ab5332d8b8951fdd0fa0c5a9459d4c}}: tlwikibooks: Add WB as an alias to NS_PROJECT ([[phab:T274977|T274977]]) (duration: 01m 09s)
* 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|352dd72c28462755546ac36a017548a7f0925df0}}: Enable GlobalWatchlist extension on metawiki ([[phab:T260862|T260862]]) (duration: 01m 07s)
* 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6ac78bd2aa601db537f821c89b447c04927af422}}: Remove uses of removed VisualEditor config variables ([[phab:T273177|T273177]]; 2/2) (duration: 01m 07s)
* 19:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|6ac78bd2aa601db537f821c89b447c04927af422}}: Remove uses of removed VisualEditor config variables ([[phab:T273177|T273177]]; 1/2) (duration: 01m 14s)
* 18:40 ppchelko@deploy1001: Finished deploy [restbase/deploy@c5c4b2d]: Remove graphoid [[phab:T242855|T242855]] (duration: 19m 54s)
* 18:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1350.eqiad.wmnet
* 18:26 effie: enable puppet on mw*
* 18:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1344.eqiad.wmnet
* 18:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1343.eqiad.wmnet
* 18:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1275.eqiad.wmnet
* 18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@c5c4b2d]: Remove graphoid [[phab:T242855|T242855]]
* 18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1350.eqiad.wmnet
* 18:14 mutante: mw1350 - powercycled via mgmt
* 18:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1343.eqiad.wmnet
* 18:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1344.eqiad.wmnet
* 18:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1275.eqiad.wmnet
* 18:07 effie: disable puppet on mw* in eqiad
* 17:36 godog: roll-restart logstash7 in codfw/eqiad to apply ulogd filters - [[phab:T234565|T234565]]
* 17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1035.eqiad.wmnet with reason: REIMAGE
* 17:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1035.eqiad.wmnet with reason: REIMAGE
* 17:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1343.eqiad.wmnet with reason: REIMAGE
* 17:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1344.eqiad.wmnet with reason: REIMAGE
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1343.eqiad.wmnet with reason: REIMAGE
* 17:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1275.eqiad.wmnet with reason: REIMAGE
* 17:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1344.eqiad.wmnet with reason: REIMAGE
* 17:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1275.eqiad.wmnet with reason: REIMAGE
* 17:07 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: REIMAGE
* 17:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: REIMAGE
* 16:58 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 16:46 godog: roll-restart logstash to apply ulogd filter - [[phab:T234565|T234565]]
* 16:42 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 16:41 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 16:33 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 16:32 moritzm: installing intel-microcode security updates on buster
* 16:23 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 16:08 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 16:06 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@b5f4a3e]: (no justification provided) (duration: 00m 30s)
* 16:05 oblivian@deploy1001: Started deploy [docker-pkg/deploy@b5f4a3e]: (no justification provided)
* 15:36 cdanis: [[phab:T275028|T275028]] rolling restart done; check for fetch failures once caches re-fill
* 15:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
* 15:31 moritzm: uploaded jasper 1.900.1-debian1-2.4+deb8u6+wmf3 to apt.wikimedia.org
* 15:28 root@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
* 15:26 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
* 15:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
* 15:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1001.eqiad.wmnet
* 15:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1001.eqiad.wmnet
* 14:26 cdanis: starting rolling restart of cp-upload@eqsin varnish-fe [[phab:T275028|T275028]]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14396 and previous config saved to /var/cache/conftool/dbconfig/20210217-135533-root.json
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 80%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14395 and previous config saved to /var/cache/conftool/dbconfig/20210217-134030-root.json
* 13:30 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:30 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 13:28 moritzm: installing libzstd security updates on Buster
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 60%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14393 and previous config saved to /var/cache/conftool/dbconfig/20210217-132526-root.json
* 13:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:664593{{!}}Enable Wikibase Repo ID generator rate limiting on Wikidata (T272032)]] (duration: 01m 11s)
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14392 and previous config saved to /var/cache/conftool/dbconfig/20210217-131022-root.json
* 13:06 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:05 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:55 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:55 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 40%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14391 and previous config saved to /var/cache/conftool/dbconfig/20210217-125519-root.json
* 12:50 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:49 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:45 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:45 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:42 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:42 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:40 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 20%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14390 and previous config saved to /var/cache/conftool/dbconfig/20210217-124015-root.json
* 12:40 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6eeee95e090408c8bd35d14c2f76e3afd8a59048}}: vector: Enable search treatment AB test on test wikis ([[phab:T259798|T259798]]) (duration: 01m 08s)
* 12:10 urbanecm@deploy1001: Synchronized dblists/desktop-improvements.dblist: {{Gerrit|7872251778b65cb03eb5457f1b901d208d514609}}: Revert "Revert "vector: Enable WVUI search on test wikis"" ([[phab:T259798|T259798]]) (duration: 01m 09s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7872251778b65cb03eb5457f1b901d208d514609}}: Revert "Revert "vector: Enable WVUI search on test wikis"" ([[phab:T259798|T259798]]) (duration: 01m 25s)
* 11:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2001.wikimedia.org
* 11:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netbox-dev2001.wikimedia.org
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1172 in s8 - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14389 and previous config saved to /var/cache/conftool/dbconfig/20210217-112422-marostegui.json
* 11:08 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:08 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 11:04 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:04 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 11:04 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:03 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudnet1004.eqiad.wmnet with reason: hardware failure
* 10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudnet1004.eqiad.wmnet with reason: hardware failure
* 10:13 _joe_: depooling mw1331 to perform some tests for [[phab:T266855|T266855]]
* 10:08 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:01 aborrero@cumin1001: START - Cookbook sre.dns.netbox
* 09:32 elukey: reboot dbstore100[3-5] for kernel upgrades
* 08:44 marostegui: upgrade es2020 es2021 es2022's kernel
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1172 in s8 - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14388 and previous config saved to /var/cache/conftool/dbconfig/20210217-084120-marostegui.json
* 08:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
* 08:04 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1172 in s8 - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14387 and previous config saved to /var/cache/conftool/dbconfig/20210217-074107-marostegui.json
* 07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
* 07:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
* 07:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
* 07:23 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1172 in s8 for the first time - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14386 and previous config saved to /var/cache/conftool/dbconfig/20210217-072131-marostegui.json
* 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
* 07:16 marostegui: Add x1 to orchestrator
* 07:04 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
* 07:01 marostegui: Restart db1103 (x1) primary master DONE - [[phab:T273758|T273758]]
* 07:00 marostegui: Restart db1103 (x1) primary master - [[phab:T273758|T273758]]
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1172 to dbctl, but not pooled yet [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14385 and previous config saved to /var/cache/conftool/dbconfig/20210217-063915-marostegui.json
* 01:41 mutante: mwdebug1001 - back on buster and pooled
* 01:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mwdebug1001.eqiad.wmnet
* 01:39 mutante: mwdebug1001 - rebooting
* 01:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1345.eqiad.wmnet
* 01:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1351.eqiad.wmnet
* 01:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1001.eqiad.wmnet
* 01:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mwdebug1001.eqiad.wmnet
* 00:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1345.eqiad.wmnet
* 00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1351.eqiad.wmnet
* 00:33 mutante: mw1351 - powercycled
* 00:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1001.eqiad.wmnet
* 00:17 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/timeline/: Add $wgTimelineFontDirectory to be passed as GDFONTPATH ([[phab:T274822|T274822]]) (duration: 01m 06s)
* 00:15 legoktm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/timeline/: Add $wgTimelineFontDirectory to be passed as GDFONTPATH ([[phab:T274822|T274822]]) (duration: 01m 02s)
* 00:13 legoktm@deploy1001: Synchronized wmf-config/timeline.php: Set $wgTimelineFontDirectory ([[phab:T274822|T274822]]) (duration: 01m 05s)
* 00:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1345.eqiad.wmnet with reason: REIMAGE
* 00:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1345.eqiad.wmnet with reason: REIMAGE


== 2021-02-16 ==
== 2021-10-08 ==
* 23:54 mutante: puppetmaster1001 - puppet cert clean mwdebug1001, sign new request, initial puppet run, now on buster ([[phab:T274023|T274023]])
* 23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
* 23:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: REIMAGE
* 23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: REIMAGE
* 21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
* 23:44 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mwdebug1001.eqiad.wmnet
* 21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
* 23:44 mutante: reimaging mwdebug1001 with buster
* 21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
* 23:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1001.eqiad.wmnet
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 23:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: OS upgrade
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 23:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: OS upgrade
* 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 23:09 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.30/includes/HookContainer/DeprecatedHooks.php: silence deprecation refs [[phab:T274889|T274889]] (duration: 01m 14s)
* 20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 22:52 jgleeson: updated payments-wiki config to {{Gerrit|3d1b4564a2}}
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad