You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
imported>Stashbot
(oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production)
(38 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2021-11-30 ==
== 2022-01-16 ==
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 00:17 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742524{{!}}Enable scroll tracking for all users (T292586)]] (duration: 00m 55s)
* 08:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply on production
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 00:14 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/readingDepth.js: Backport: [[gerrit:742517{{!}}Provide fallback for config variable when not present]] (duration: 00m 55s)
* 08:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply on production
* 00:13 catrope@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:738530{{!}}allow sysops to set/remove reviewer group on ckbwiki (T294696)]] (duration: 00m 55s)


== 2021-11-29 ==
== 2022-01-15 ==
* 22:32 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/EntitySchema/src/MediaWiki/Specials/SetEntitySchemaLabelDescriptionAliases.php: Deploy security patch for [[phab:T296578|T296578]] (duration: 00m 55s)
* 08:55 legoktm: finished running recountCategories on s4 wikis ([[phab:T299244|T299244]])
* 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 legoktm: finished running recountCategories on s7 wikis ([[phab:T299244|T299244]])
* 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:51 legoktm: finished running recountCategories on s2 wikis ([[phab:T299244|T299244]])
* 22:20 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FileImporter/src/Remote/MediaWiki/HttpApiLookup.php: Backport: [[gerrit:742263{{!}}SECURITY: Fix special page displaying unescaped user input (T296605)]] (duration: 00m 56s)
* 06:41 <legoktm>: finished running recountCategories on s3 wikis ([[phab:T299244|T299244]])
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:21 <legoktm>: finished running recountCategories on s6 wikis ([[phab:T299244|T299244]])
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:19 <legoktm>: finished running recountCategories on s5 wikis ([[phab:T299244|T299244]])
* 20:46 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Fix wgWikiLambdaOrchestratorLocation service pointer typo (duration: 00m 55s)
* 06:18 <legoktm>: finished running recountCategories on s8 wikis ([[phab:T299244|T299244]])
* 20:27 tgr: UTC evening deploys done
* 06:14 legoktm: running recountCategories on s3 wikis
* 20:26 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742261{{!}}GrowthExperiments: Start imagerecommendation variant experiment]] (duration: 00m 55s)
* 05:20 legoktm: started recountCategories.php --wiki=enwiki --mode pages ([[phab:T299244|T299244]])
* 20:23 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/AddImageSubmissionHandler.php: Backport: [[gerrit:742262{{!}}AddImage: Refresh user's task feed after undecided rejection (T296491)]] (duration: 00m 56s)
* 03:05 legoktm: started refreshLinks --dfn-only via systemd units for s7-s8 ([[phab:T299244|T299244]])
* 20:21 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:742260{{!}}SuggestedEdits: Drop isActivated() check in getJsData (T296626)]] (duration: 00m 56s)
* 03:01 legoktm: started refreshLinks --dfn-only via systemd units for s2-s6 ([[phab:T299244|T299244]])
* 20:17 ejegg: updated payments-wiki from {{Gerrit|d1d6f024}} -> {{Gerrit|dbc92132}}
* 02:55 legoktm: started mwscript refreshLinks.php --wiki=commonswiki --dfn-only ([[phab:T299244|T299244]])
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:54 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only ([[phab:T299244|T299244]])
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:52 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only
* 20:10 eileen: civicrm
* 01:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:04 legoktm: starting recountCategories.php --mode pages --wiki enwiki on mwmaint1002
* 20:00 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T295705|T295705]] Move CirrusSearch traffic back to eqiad (duration: 00m 56s)
* 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:42 legoktm: uploaded php-yaml_2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1_amd64.changes to apt.wm.o ([[phab:T296331|T296331]])
* 00:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:16 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 00:58 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 18:55 bblack: repooling esams
* 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:48 bblack: esams: shifting depool method to esams-offline (now that its config is fixed)
* 00:52 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17 refs [[phab:T293958|T293958]] (duration: 00m 52s)
* 18:42 legoktm: depooling esams
* 00:51 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 18:17 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:742259{{!}}rdbms: Add DB host to TransactionProfiler logging and fix time fields (T295706)]] (duration: 00m 56s)
* 00:46 jforrester@deploy1002: Finished scap: Revert "LinksUpdate refactor" and follow-ups for [[phab:T299244|T299244]] re. [[phab:T293958|T293958]] (duration: 03m 58s)
* 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:42 jforrester@deploy1002: Started scap: Revert "LinksUpdate refactor" and follow-ups for [[phab:T299244|T299244]] re. [[phab:T293958|T293958]]
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:40 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Initial Beta Cluster deployment of Wikifunctions: III - CS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:38 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 00:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "all/group1 wikis to 1.38.0-wmf.17"
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:22 jforrester@deploy1002: Synchronized wmf-config/ProductionServices.php: Initial Beta Cluster deployment of Wikifunctions: II - Services for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Initial Beta Cluster deployment of Wikifunctions: I - IS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06d8d25f6e89be0b1692d017bdbc2c9524372c0b}}: foundationwiki: Remove explicit wmgUseOAuth (duration: 00m 57s)
* 16:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|bad34ed8d86b30eb4c240da0498ddfb44af30ea7}}: Make foundationwiki a standard CentralAuth wiki ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 16:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|567f2a9d4883c9a98a3251f153ea0ad58d7774c6}}: Revert "foundationwiki: Set wmgLocalAuthLoginOnly=false temporarily" ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS buster
* 16:04 moritzm: sudo gnt-cluster upgrade --to 2.16 for Ganeti codfw cluster
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 15:51 James_F: Running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiki en wikimedia wikifunctionswiki wikifunctions.beta.wmflabs.org in Beta Cluster for [[phab:T284162|T284162]]
* 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS buster
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:47 papaul: power down logstash2028 for IDRAC reset
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 moritzm: gnt-cluster renew-crypto --new-cluster-certificate for codfw Ganeti cluster [[phab:T296622|T296622]]
* 14:40 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:38 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:37 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:55 vgutierrez: repool cp3064 - [[phab:T290005|T290005]]
* 12:51 moritzm: upgrading ganeti codfw cluster to 2.16 backport [[phab:T296622|T296622]]
* 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:32 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 12:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: {{Gerrit|05704407395fbf227eec47cf716393dc60a36a35}}: Fix error handling in SuggestedEdits::getActionData ([[phab:T296366|T296366]]) (duration: 05m 37s)
* 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7fdea3e71e4fd9e85c30efbc17f94c0711deb252}}:  Add planet4589.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T296136|T296136]]) (duration: 00m 56s)
* 12:11 vgutierrez: pool cp3064 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS buster
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:07 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|4662224229cb4083b8b01de436ccd65e8c00e7dd}}: Remove search.wikimedia.org files ([[phab:T289224|T289224]]) (duration: 00m 56s)
* 11:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature(2/2; [[phab:T296297|T296297]]) (duration: 00m 55s)
* 10:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/Special/SpecialMultiLock.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature (1/2; [[phab:T296297|T296297]]) (duration: 00m 56s)
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d01652ec22f6cb3413b419a3c9b0a7a08d79960f}}: Disable Growth IP research survey ([[phab:T294568|T294568]]) (duration: 00m 56s)
* 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:45 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3064.esams.wmnet with OS buster
* 10:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:01 vgutierrez: depool cp3064 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2041.codfw.wmnet with OS buster
* 09:52 vgutierrez: pool cp2041 with HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:34 moritzm: rolling restart of mediawiki canaries to pick up ICU security updates
* 09:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|3a892860b2e1e2ac7b60fc1c4dbdb2035d6af950}}: foundationwiki: Do not enable wmgUsePageViewInfo explicitly (duration: 00m 55s)
* 09:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=foundationwiki 'inactive' # removing nonexistent group; backup left at P17888
* 09:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|786313c06188d5d63700d7e46384ef99a9297b57}}: foundationwiki: Clear group add/remove declarations (duration: 00m 55s)
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3f47dc55b67d2b53ec27bb610978ff8165aa6ca}}: foundationwiki: Disable hard redirects (duration: 00m 57s)
* 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2041.codfw.wmnet with OS buster
* 08:56 vgutierrez: depool cp2041 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 08:54 moritzm: installing ICU security updates on buster
* 08:33 moritzm: installing bluez security updates
* 08:26 moritzm: installing libvpx security updates
* 08:19 moritzm: instaling libntlm security updates
* 08:07 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 07m 01s)
* 08:00 marostegui: Restart db2078 and db1117
* 08:00 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 07:31 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time) (duration: 00m 04s)
* 07:31 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time)
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2014.codfw.wmnet with OS bullseye
* 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bullseye


== 2021-11-28 ==
== 2022-01-14 ==
* 17:14 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 02m 11s)
* 23:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS stretch
* 17:12 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 22:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 18:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 18:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 17:44 bblack: drmrs asw: removed native-vlan-id from config on secondary (x-rack) interfaces of lvses to debug network issue
* 17:26 bblack: reboot lvs600[23]
* 16:55 bblack: reboot lvs6001
* 16:30 bblack: rebooting cp60xx where x is 6, 7, 8, 14, 15, 16 (downtimed)
* 16:15 dancy@deploy1002: Synchronized README: Testing php-fpm restart (duration: 03m 18s)
* 16:04 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 15:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 15:39 bblack: lvs6001 + all services downtimed
* 15:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: dc=drmrs
* 15:00 bblack: silenced site=drmrs in alertmanager for one month, I think
* 15:00 bblack: silenced site=drmrs in alertmanager, I think
* 13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bullseye
* 13:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bullseye
* 12:53 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 12:51 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1024.eqiad.wmnet with OS buster
* 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1024.eqiad.wmnet with OS buster
* 12:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 12:18 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 11:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 11:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster
* 11:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
* 11:00 moritzm: systemctl reset-failed ifup@ens5.service on archiva1002 [[phab:T273026|T273026]]
* 10:56 moritzm: rebooting archiva1002 (running archiva.wikimedia.org)
* 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
* 10:55 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 10:50 moritzm: systemctl reset-failed ifup@ens5.service on an-test-ui1001 [[phab:T273026|T273026]]
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-ui1001.eqiad.wmnet
* 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-ui1001.eqiad.wmnet
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-presto1001.eqiad.wmnet
* 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-presto1001.eqiad.wmnet
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 10:05 moritzm: rebooting matomo1002 (running piwik.wikimedia.org)
* 10:04 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-druid1001.eqiad.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-druid1001.eqiad.wmnet
* 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt1001.wikimedia.org
* 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt1001.wikimedia.org
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install1003.wikimedia.org
* 09:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install1003.wikimedia.org
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-client1001.eqiad.wmnet
* 09:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-client1001.eqiad.wmnet
* 09:11 marostegui: Move pc1014 from pc1 to pc2 [[phab:T299046|T299046]]
* 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2013.codfw.wmnet with OS bullseye
* 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1009.eqiad.wmnet
* 09:01 moritzm: rebooting an-tool1009 (running hue.wikimedia.org)
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1009.eqiad.wmnet
* 09:00 moritzm: systemctl reset-failed ifup@ens5.service on an-tool1005 [[phab:T273026|T273026]]
* 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1008.eqiad.wmnet
* 08:58 moritzm: rebooting an-tool1008 (running yarn.wikimedia.org)
* 08:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1008.eqiad.wmnet
* 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1007.eqiad.wmnet
* 08:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1007.eqiad.wmnet
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1005.eqiad.wmnet
* 08:51 moritzm: rebooting an-tool1007 (running turnilo.wikimedia.org)
* 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1005.eqiad.wmnet
* 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
* 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2013.codfw.wmnet with OS bullseye
* 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2012.codfw.wmnet with OS bullseye
* 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2012.codfw.wmnet with OS bullseye
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18735 and previous config saved to /var/cache/conftool/dbconfig/20220114-063554-marostegui.json
* 06:15 marostegui: Failover m5 proxy from dbproxy1017 to dbproxy1021 [[phab:T298586|T298586]]
* 05:16 legoktm: manually restarted discard_held_messages service on lists1001, failed with a spurious sqlalchemy issue about packets being out of order
* 00:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17 refs [[phab:T293958|T293958]]
* 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:15 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 06s)
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:09 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/includes/content/WikitextContentHandler.php: Backport: [[gerrit:753828{{!}}In WikitextContentHandler always use getFreshParser() (T299149)]] (duration: 01m 07s)


== 2021-11-27 ==
== 2022-01-13 ==
* 19:55 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]] (duration: 04m 14s)
* 22:40 WFan: Updating payment-wiki, revision changed from {{Gerrit|8497eae9}} to {{Gerrit|5cc9d5e0}}
* 19:51 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]]
* 22:18 dzahn@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=miscweb
* 19:47 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev (duration: 02m 01s)
* 22:00 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb
* 19:45 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev
* 21:48 mutante: running puppet on cp-ulsfo
* 12:22 elukey: drop /var/tmp/core files from ores100[2,4] root partition full
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:10 elukey: drop /var/tmp/core files from ores1009, root partition full
* 21:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:55 elukey: disable coredumps for ORES celery units (will cause a roll restart of all celeries) - [[phab:T296563|T296563]]
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:46 elukey: drop ores coredumps from ores1008
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:56 elukey: powercycle analytics1071, soft lockup stacktraces in the tty
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:51 elukey: move ores coredump files from /var/cache/tmp to /srv/coredumps on ores100[6,7,8] and ores2003 to free space on the root partition
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:31 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.17"
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:29 dduvall: rolling back wmf.17 from group1 due to a large increase in "Parser state cleared while parsing" across commons and group1 wikipedias ([[phab:T293958|T293958]], [[phab:T299149|T299149]])
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:17 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 06s)
* 20:16 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:07 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 19:42 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2051.codfw.wmnet with OS stretch
* 19:40 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
* 19:40 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753634{{!}}Enable ArticlePlaceholder on dagwiki (T298349)]] (duration: 01m 13s)
* 19:37 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
* 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:25 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
* 19:23 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747993{{!}}Add event stream config for ios.notification_interaction (T290920)]] (duration: 01m 13s)
* 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747991{{!}}Add event stream config for android.customize_toolbar_interaction (T297818)]] (duration: 01m 12s)
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:07 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753793{{!}}Enable skin migration mode on the beta cluster]] (duration: 01m 14s)
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
* 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
* 17:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:34 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:22 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 16:27 moritzm: impor maps-deduped-tilelist 0.0.5 to buster-wikimedia/main [[phab:T297408|T297408]]
* 16:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
* 16:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
* 15:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:50 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aphlict1001.eqiad.wmnet
* 15:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aphlict1001.eqiad.wmnet
* 15:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM flowspec1001.eqiad.wmnet
* 15:40 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM flowspec1001.eqiad.wmnet
* 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1004.wikimedia.org
* 15:26 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1004.wikimedia.org
* 15:23 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1003.wikimedia.org
* 15:21 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2009.codfw.wmnet with OS buster
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1003.wikimedia.org
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM seaborgium.wikimedia.org
* 15:15 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM seaborgium.wikimedia.org
* 15:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1002.wikimedia.org
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1002.wikimedia.org
* 14:56 mmandere: cp3053: upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 14:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1001.wikimedia.org
* 14:47 moritzm: systemctl reset-failed ifup@ens5.service on idp1001 [[phab:T273026|T273026]]
* 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp1001.wikimedia.org
* 14:15 moritzm: switch ml-etcd1003 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
* 14:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
* 13:53 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet
* 13:49 moritzm: switch ml-etcd1002 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
* 13:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
* 13:45 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet
* 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1001.wikimedia.org
* 13:33 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1001.wikimedia.org
* 13:23 moritzm: switch ml-etcd1001 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
* 13:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
* 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1001-dev.eqiad.wmnet
* 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1001-dev.eqiad.wmnet
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18731 and previous config saved to /var/cache/conftool/dbconfig/20220113-124307-root.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18730 and previous config saved to /var/cache/conftool/dbconfig/20220113-124300-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all special groups from s3 codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18729 and previous config saved to /var/cache/conftool/dbconfig/20220113-124140-marostegui.json
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1021', diff saved to https://phabricator.wikimedia.org/P18728 and previous config saved to /var/cache/conftool/dbconfig/20220113-123744-marostegui.json
* 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1002-dev.eqiad.wmnet
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18727 and previous config saved to /var/cache/conftool/dbconfig/20220113-122803-root.json
* 12:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
* 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp1001.wikimedia.org
* 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp1001.wikimedia.org
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18726 and previous config saved to /var/cache/conftool/dbconfig/20220113-121300-root.json
* 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM eventlog1003.eqiad.wmnet
* 11:59 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM eventlog1003.eqiad.wmnet
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18725 and previous config saved to /var/cache/conftool/dbconfig/20220113-115756-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18724 and previous config saved to /var/cache/conftool/dbconfig/20220113-114252-root.json
* 11:34 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18723 and previous config saved to /var/cache/conftool/dbconfig/20220113-112749-root.json
* 11:26 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
* 11:26 _joe_: update scap everywhere [[phab:T298986|T298986]]
* 11:25 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: scap testing (duration: 00m 09s)
* 11:25 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: scap testing
* 11:24 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: (no justification provided) (duration: 00m 09s)
* 11:23 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: (no justification provided)
* 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1001.eqiad.wmnet
* 11:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2022.codfw.wmnet with OS bullseye
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1001.eqiad.wmnet
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18722 and previous config saved to /var/cache/conftool/dbconfig/20220113-111245-root.json
* 11:11 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
* 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1001.wikimedia.org
* 11:08 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
* 11:03 moritzm: rebooting netbox1001 (running netbox.wikimedia.org)
* 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox1001.wikimedia.org
* 11:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1001.eqiad.wmnet with OS buster
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb1001.eqiad.wmnet
* 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb1001.eqiad.wmnet
* 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18721 and previous config saved to /var/cache/conftool/dbconfig/20220113-105741-root.json
* 10:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
* 10:52 hashar: Restarting Jenkins CI for plugins update [[phab:T298691|T298691]]
* 10:47 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader1001.eqiad.wmnet
* 10:45 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
* 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader1001.eqiad.wmnet
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2022.codfw.wmnet with OS bullseye
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18720 and previous config saved to /var/cache/conftool/dbconfig/20220113-104238-root.json
* 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1001.wikimedia.org
* 10:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1001.eqiad.wmnet with OS buster
* 10:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc1001.wikimedia.org
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18719 and previous config saved to /var/cache/conftool/dbconfig/20220113-102734-root.json
* 10:27 moritzm: systemctl reset-failed ifup@ens5.service on lists1001 [[phab:T273026|T273026]]
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana1002.eqiad.wmnet
* 10:10 moritzm: rebooting grafana1002 (running grafana.wikimedia.org)
* 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana1002.eqiad.wmnet
* 10:09 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 10:02 mmandere: cp3052: upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 10:02 joal@deploy1002: Finished deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386] (duration: 21m 47s)
* 10:02 elukey: run kafka preferred-replica-election on kafka-main1001 to force a rebalance of partition leaders (after kafka-main1002's reimage)
* 10:00 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
* 09:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1002.eqiad.wmnet with OS buster
* 09:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
* 09:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386]
* 09:40 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386] (duration: 00m 07s)
* 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386]
* 09:39 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386] (duration: 06m 59s)
* 09:35 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:32 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386]
* 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:26 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1002.eqiad.wmnet with OS buster
* 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:24 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui1001.eqiad.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui1001.eqiad.wmnet
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM lists1001.wikimedia.org
* 09:02 moritzm: rebooting lists1001 (running lists.wikimedia.org) to pick up new KVM setting
* 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM lists1001.wikimedia.org
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022, give weight to es1021 [[phab:T295965|T295965]] ', diff saved to https://phabricator.wikimedia.org/P18718 and previous config saved to /var/cache/conftool/dbconfig/20220113-085906-marostegui.json
* 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1003.eqiad.wmnet with OS buster
* 08:39 elukey: ipmi mc reset cold for kafka-main1002, mgmt interface not reachable via ssh
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18717 and previous config saved to /var/cache/conftool/dbconfig/20220113-083923-marostegui.json
* 08:28 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753505{{!}}Take LogicException into consideration (T299111)]] (duration: 01m 28s)
* 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753504{{!}}Take LogicException into consideration (T299111)]] (duration: 01m 28s)
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1003.eqiad.wmnet with OS buster
* 08:06 marostegui: Change innodb_checksum_algorithm=full_crc32 on eqiad sanitarium hosts (db1154, db1155) [[phab:T287244|T287244]]
* 08:02 elukey: ipmi mc reset cold for kafka-main1003, mgmt interface not reachable via ssh
* 07:57 elukey: stop kafka* on kafka-main1003 as prep-step for reimage to buster
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18715 and previous config saved to /var/cache/conftool/dbconfig/20220113-075012-marostegui.json
* 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1015.eqiad.wmnet with OS bullseye
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1015.eqiad.wmnet with OS bullseye
* 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:41 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/includes/export/WikiExporter.php: Backport: [[gerrit:753501{{!}}export: Remove ignoring rev_page_id index (T163532)]] (duration: 01m 28s)
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18714 and previous config saved to /var/cache/conftool/dbconfig/20220113-064113-root.json
* 06:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:38 marostegui: Failover m3 proxy from dbproxy1016 to dbproxy1020 [[phab:T298586|T298586]]
* 06:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:26 marostegui: Remove rev_page_id from frwiki,jawiki,ruwiki and labswiki from db1096 (s6) [[phab:T285149|T285149]]
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18713 and previous config saved to /var/cache/conftool/dbconfig/20220113-062609-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18712 and previous config saved to /var/cache/conftool/dbconfig/20220113-061105-root.json
* 06:05 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 27s)
* 05:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 05:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18711 and previous config saved to /var/cache/conftool/dbconfig/20220113-055602-root.json
* 05:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:53 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/tests/phpunit/unit/includes/libs/rdbms/database/DatabaseSQLTest.php: (no justification provided) (duration: 01m 32s)
* 05:00 TimStarling: doing [[phab:T299095|T299095]] restorations on s3 wikis
* 04:30 TimStarling: on mwmaint1002: inserting 11565 rows into itwiki.pagelinks for [[phab:T299095|T299095]]
* 03:33 TimStarling: on mwmaint1002: inserting {{Gerrit|1714288}} into wikidatawiki.pagelinks for [[phab:T299095|T299095]]
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:30 TimStarling: on mwmaint1002: inserting {{Gerrit|4221344}} rows into commonswiki.pagelinks to clean up from [[phab:T299095|T299095]]
* 02:29 tstarling@deploy1002: Synchronized php-1.38.0-wmf.16/maintenance/sql.php: batch size (duration: 01m 28s)
* 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752751{{!}}Enable CirrusSearch on it/en Wikivoyage]] (duration: 01m 28s)
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752760{{!}}Skip vector-2022 skin in config, not Vector skin (T298923)]] (duration: 01m 29s)
* 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:11 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753584{{!}}Enable Disambiguator notifications on all wikis (T293319)]] (duration: 01m 28s)
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn


== 2021-11-26 ==
== 2022-01-12 ==
* 16:11 arnoldokoth: drain kubestage1002 node in prep for decommissioning
* 23:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:05 arnoldokoth: drain kubestage1001 node in prep for decommissioning
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:46 elukey: move /var/tmp/core/* to /srv/coredumps on ores1008 to free root space
* 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:30 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:21 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.17
* 13:48 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:07 jhathaway: rebooting mx1001 to get old kernel
* 13:46 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:48 cwhite: end eqiad opensearch upgrade [[phab:T288621|T288621]]
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18709 and previous config saved to /var/cache/conftool/dbconfig/20220112-214258-marostegui.json
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 21:28 mbsantos: mbsantos@maps1009.eqiad.wmnet: start imposm-initial-import  - full planet re-import ([[phab:T299049|T299049]])
* 12:21 vgutierrez: restarting HAProxy on O:cache::upload_haproxy - [[phab:T290005|T290005]]
* 21:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18708 and previous config saved to /var/cache/conftool/dbconfig/20220112-212753-marostegui.json
* 11:41 akosiaris: [[phab:T296303|T296303]] cleanup weird state of calico-codfw cluster
* 21:19 ryankemper: [WDQS] [[phab:T299098|T299098]] depooled `wdqs2003` so dc-ops can take a look at the PS2 failure
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 21:18 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2] (duration: 06m 57s)
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18707 and previous config saved to /var/cache/conftool/dbconfig/20220112-211248-marostegui.json
* 11:39 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2]
* 11:25 vgutierrez: restarting HAProxy on O:cache::(text{{!}}upload)_haproxy - [[phab:T290005|T290005]]
* 21:11 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2] (duration: 00m 07s)
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17880 and previous config saved to /var/cache/conftool/dbconfig/20211126-102340-ladsgroup.json
* 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2]
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17879 and previous config saved to /var/cache/conftool/dbconfig/20211126-101714-ladsgroup.json
* 21:10 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2] (duration: 24m 20s)
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18706 and previous config saved to /var/cache/conftool/dbconfig/20220112-205744-marostegui.json
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18705 and previous config saved to /var/cache/conftool/dbconfig/20220112-205636-marostegui.json
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17878 and previous config saved to /var/cache/conftool/dbconfig/20211126-101423-ladsgroup.json
* 20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17877 and previous config saved to /var/cache/conftool/dbconfig/20211126-100547-ladsgroup.json
* 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 10:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18704 and previous config saved to /var/cache/conftool/dbconfig/20220112-205629-marostegui.json
* 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 20:46 joal@deploy1002: Started deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2]
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18703 and previous config saved to /var/cache/conftool/dbconfig/20220112-204124-marostegui.json
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 20:36 dduvall: 1.38.0-wmf.17 rolled back from group1 due to large spike in db read-only errors and slow queries ([[phab:T293958|T293958]])
* 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17876 and previous config saved to /var/cache/conftool/dbconfig/20211126-082834-ladsgroup.json
* 20:33 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.38.0-wmf.17
* 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17875 and previous config saved to /var/cache/conftool/dbconfig/20211126-081329-ladsgroup.json
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17874 and previous config saved to /var/cache/conftool/dbconfig/20211126-075824-ladsgroup.json
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17873 and previous config saved to /var/cache/conftool/dbconfig/20211126-074320-ladsgroup.json
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:28 Amir1: killing extensions/MachineVision/maintenance/fetchSuggestions.php in mwmaint
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:19 Amir1: killing lingering process from mwmaint to depooled db (db1160) that was depooled nine hours ago
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18702 and previous config saved to /var/cache/conftool/dbconfig/20220112-202619-marostegui.json
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:21 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 21s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:19 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:19 jgleeson: updated payments from {{Gerrit|939cb4bc}} to {{Gerrit|8497eae9}}
* 20:17 mutante: applying firewall change on phabricator (VCS, git-ssh), second attempt, first codfw-only
* 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18701 and previous config saved to /var/cache/conftool/dbconfig/20220112-201114-marostegui.json
* 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18700 and previous config saved to /var/cache/conftool/dbconfig/20220112-200806-marostegui.json
* 20:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18699 and previous config saved to /var/cache/conftool/dbconfig/20220112-200759-marostegui.json
* 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18698 and previous config saved to /var/cache/conftool/dbconfig/20220112-195254-marostegui.json
* 19:52 hashar: Restarting CI Jenkins once more to apply the Gearman plugin update [[phab:T298691|T298691]]
* 19:44 hashar: Clearing /srv partition on integration-castor03
* 19:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18697 and previous config saved to /var/cache/conftool/dbconfig/20220112-193749-marostegui.json
* 19:34 hashar: Upgrading CI Jenkins and Gearman plugin [[phab:T298691|T298691]]
* 19:29 mutante: wdqs2003 - one power supply failed so it's not redundant anymore, says Icinga
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:25 cwhite: begin eqiad opensearch upgrade [[phab:T288621|T288621]]
* 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18696 and previous config saved to /var/cache/conftool/dbconfig/20220112-192244-marostegui.json
* 19:22 mutante: deneb - for some reason the "package builder clean up build directory"-service fails [[phab:T287222|T287222]]
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:21 cjming: end of UTC evening backport & config window
* 19:21 mutante: [deneb:~] $ sudo systemctl start  package_builder_Clean_up_build_directory.service
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753187{{!}}Add new vector skin key to RelatedArticlesFooterAllowedSkins. (T298916)]] (duration: 01m 21s)
* 19:18 mutante: pybal-test2002 - apt-get clean after icinga alert about disk space running out
* 19:17 mutante: zookeeper-test1002 - CRITICAL - degraded: The following units failed: ifup@ens5.service - for this issue see [[phab:T273026|T273026]] ([[phab:T268074|T268074]])
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:14 mutante: elastic10180 - one power supply seeming failed - see icinga IPMI alert - [Status = Critical, PS Redundancy = Critical] [[phab:T294805|T294805]]
* 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18695 and previous config saved to /var/cache/conftool/dbconfig/20220112-191436-marostegui.json
* 19:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 19:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18694 and previous config saved to /var/cache/conftool/dbconfig/20220112-191428-marostegui.json
* 19:13 cjming@deploy1002: Synchronized php-1.38.0-wmf.17/includes/export/WikiExporter.php: Backport: [[gerrit:753085{{!}}Partial revert of I1a691f01cd82e60bf41207d32501edb4b9835e37 to unbreak dumps (T299020)]] (duration: 01m 22s)
* 19:12 mutante: mirror1001 - CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service - [[phab:T286898|T286898]]
* 19:09 hashar: Upgraded releases Jenkins from 2.319.1 to 2.319.2 # [[phab:T298691|T298691]]
* 19:06 moritzm: imported jenkins 2.319.2 to thirdparty/ci fpr buster-wikimedia
* 19:05 mutante: [mwmaint1002:~] $ sudo systemctl status mediawiki_job_updatequerypages_mostlinked_s3@13.service (running fine but had failed for unknown reason last time it was supposed to run automatically)
* 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18693 and previous config saved to /var/cache/conftool/dbconfig/20220112-185923-marostegui.json
* 18:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18692 and previous config saved to /var/cache/conftool/dbconfig/20220112-184418-marostegui.json
* 18:40 mutante: phab1001 - temp disabling puppet - deployed firewall change on phab2001 - debugging - no impact
* 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18691 and previous config saved to /var/cache/conftool/dbconfig/20220112-182913-marostegui.json
* 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18690 and previous config saved to /var/cache/conftool/dbconfig/20220112-182806-marostegui.json
* 18:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18689 and previous config saved to /var/cache/conftool/dbconfig/20220112-182725-marostegui.json
* 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18688 and previous config saved to /var/cache/conftool/dbconfig/20220112-181220-marostegui.json
* 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18687 and previous config saved to /var/cache/conftool/dbconfig/20220112-175715-marostegui.json
* 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18686 and previous config saved to /var/cache/conftool/dbconfig/20220112-174211-marostegui.json
* 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18685 and previous config saved to /var/cache/conftool/dbconfig/20220112-174103-marostegui.json
* 17:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 17:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18684 and previous config saved to /var/cache/conftool/dbconfig/20220112-174056-marostegui.json
* 17:38 _joe_: deploying scap 4.1.1 to the restbase canaries [[phab:T298986|T298986]]
* 17:34 _joe_: deploying scap 4.1.1 to the mediawiki canaries [[phab:T298986|T298986]]
* 17:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1169.eqiad.wmnet with OS bullseye
* 17:27 dancy@deploy1002: Started scap: testing
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18683 and previous config saved to /var/cache/conftool/dbconfig/20220112-172551-marostegui.json
* 17:25 dancy@deploy1002: Started scap: testing
* 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18682 and previous config saved to /var/cache/conftool/dbconfig/20220112-171047-marostegui.json
* 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:06 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
* 17:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 16:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1005.eqiad.wmnet
* 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18681 and previous config saved to /var/cache/conftool/dbconfig/20220112-165542-marostegui.json
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18680 and previous config saved to /var/cache/conftool/dbconfig/20220112-165434-marostegui.json
* 16:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1005.eqiad.wmnet
* 16:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:53 hnowlan: Decommissioning cassandra instance restbase2009-c via nodetool
* 16:48 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:46 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
* 16:45 elukey: elukey@prometheus2004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:44 elukey: elukey@prometheus2003:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:40 elukey: elukey@prometheus1004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:39 elukey: elukey@prometheus1003:~$ sudo apt-get remove linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64 linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18678 and previous config saved to /var/cache/conftool/dbconfig/20220112-163919-marostegui.json
* 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx1001.wikimedia.org
* 16:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1004.eqiad.wmnet
* 16:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx1001.wikimedia.org
* 16:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:31 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1004.eqiad.wmnet
* 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:25 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 16s)
* 16:25 elukey: stop kafka* on kafka-main1003 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18677 and previous config saved to /var/cache/conftool/dbconfig/20220112-162414-marostegui.json
* 16:20 moritzm: switch kubestagetcd1006 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
* 16:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18676 and previous config saved to /var/cache/conftool/dbconfig/20220112-160910-marostegui.json
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18675 and previous config saved to /var/cache/conftool/dbconfig/20220112-160802-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 16:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18674 and previous config saved to /var/cache/conftool/dbconfig/20220112-160755-marostegui.json
* 16:02 elukey: stop kafka* on kafka-main1002 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 15:57 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 15:56 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 15:56 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18673 and previous config saved to /var/cache/conftool/dbconfig/20220112-155250-marostegui.json
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18672 and previous config saved to /var/cache/conftool/dbconfig/20220112-153745-marostegui.json
* 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18671 and previous config saved to /var/cache/conftool/dbconfig/20220112-152240-marostegui.json
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18670 and previous config saved to /var/cache/conftool/dbconfig/20220112-152133-marostegui.json
* 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18669 and previous config saved to /var/cache/conftool/dbconfig/20220112-152121-marostegui.json
* 15:14 elukey: stop kafka* on kafka-main1001 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18668 and previous config saved to /var/cache/conftool/dbconfig/20220112-150616-marostegui.json
* 14:59 moritzm: switch kubestagetcd1005 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 14:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:54 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply on main
* 14:54 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18667 and previous config saved to /var/cache/conftool/dbconfig/20220112-145111-marostegui.json
* 14:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 14:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:40 jelto: remove helm2 from deployment_server [[phab:T251305|T251305]] https://gerrit.wikimedia.org/r/c/operations/puppet/+/753026
* 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
* 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
* 14:37 jelto@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
* 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18666 and previous config saved to /var/cache/conftool/dbconfig/20220112-143606-marostegui.json
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18665 and previous config saved to /var/cache/conftool/dbconfig/20220112-143258-marostegui.json
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18664 and previous config saved to /var/cache/conftool/dbconfig/20220112-143241-marostegui.json
* 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:23 moritzm: switch kubestagetcd1004 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18663 and previous config saved to /var/cache/conftool/dbconfig/20220112-141736-marostegui.json
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part III (duration: 01m 07s)
* 14:15 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part II (duration: 01m 08s)
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1002.eqiad.wmnet
* 14:14 ladsgroup@deploy1002: Synchronized wmf-config/db-production.php: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part I (duration: 01m 07s)
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1002.eqiad.wmnet
* 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1001.eqiad.wmnet
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18662 and previous config saved to /var/cache/conftool/dbconfig/20220112-140232-marostegui.json
* 14:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1001.eqiad.wmnet
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18661 and previous config saved to /var/cache/conftool/dbconfig/20220112-135858-marostegui.json
* 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18659 and previous config saved to /var/cache/conftool/dbconfig/20220112-134727-marostegui.json
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18658 and previous config saved to /var/cache/conftool/dbconfig/20220112-134620-marostegui.json
* 13:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 13:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18657 and previous config saved to /var/cache/conftool/dbconfig/20220112-134103-root.json
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753441{{!}}Disable flaggedrevs stable template inclusion in ruwikisource (T226054)]] (duration: 01m 08s)
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18656 and previous config saved to /var/cache/conftool/dbconfig/20220112-132600-root.json
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:23 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1004.eqiad.wmnet
* 13:20 urbanecm@deploy1002: Finished scap: {{Gerrit|4b1e241}}: Undo update to the way the search interface is set (duration: 19m 19s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard1002.eqiad.wmnet
* 13:18 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1004.eqiad.wmnet
* 13:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard1002.eqiad.wmnet
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1003.eqiad.wmnet
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18655 and previous config saved to /var/cache/conftool/dbconfig/20220112-131056-root.json
* 13:08 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1003.eqiad.wmnet
* 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor1002.eqiad.wmnet
* 13:01 urbanecm@deploy1002: Started scap: {{Gerrit|4b1e241}}: Undo update to the way the search interface is set
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18654 and previous config saved to /var/cache/conftool/dbconfig/20220112-130050-marostegui.json
* 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor1002.eqiad.wmnet
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18653 and previous config saved to /var/cache/conftool/dbconfig/20220112-125552-root.json
* 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid1002.eqiad.wmnet
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18652 and previous config saved to /var/cache/conftool/dbconfig/20220112-125402-marostegui.json
* 12:52 awight: EU deployment reopened :-)
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P18651 and previous config saved to /var/cache/conftool/dbconfig/20220112-125208-marostegui.json
* 12:51 awight: EU deployment complete
* 12:50 awight@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/TemplateData: Backport: [[gerrit:752775{{!}}Allow aliases to be integers in addition to strings (T298795)]] (duration: 01m 07s)
* 12:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid1002.eqiad.wmnet
* 12:48 Amir1: removing orphan lint error reports in all wikis ([[phab:T298782|T298782]])
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18650 and previous config saved to /var/cache/conftool/dbconfig/20220112-124514-marostegui.json
* 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18649 and previous config saved to /var/cache/conftool/dbconfig/20220112-123010-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18648 and previous config saved to /var/cache/conftool/dbconfig/20220112-122742-marostegui.json
* 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18647 and previous config saved to /var/cache/conftool/dbconfig/20220112-121505-marostegui.json
* 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cfe389afce8037121f8e8b672f4fdf2458a068dd}}: fawiki: Add extendedmover usergroup ([[phab:T299038|T299038]]) (duration: 01m 08s)
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1002.eqiad.wmnet
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18646 and previous config saved to /var/cache/conftool/dbconfig/20220112-120931-marostegui.json
* 12:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1002.eqiad.wmnet
* 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1001.eqiad.wmnet
* 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1001.eqiad.wmnet
* 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases1002.eqiad.wmnet
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18645 and previous config saved to /var/cache/conftool/dbconfig/20220112-120000-marostegui.json
* 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases1002.eqiad.wmnet
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18644 and previous config saved to /var/cache/conftool/dbconfig/20220112-115259-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18643 and previous config saved to /var/cache/conftool/dbconfig/20220112-115031-marostegui.json
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18642 and previous config saved to /var/cache/conftool/dbconfig/20220112-115024-marostegui.json
* 11:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 11:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18641 and previous config saved to /var/cache/conftool/dbconfig/20220112-113518-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18640 and previous config saved to /var/cache/conftool/dbconfig/20220112-113119-marostegui.json
* 11:21 elukey: move kafka-jumbo nodes to fixed kafka uid/gid - [[phab:T296990|T296990]]
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18639 and previous config saved to /var/cache/conftool/dbconfig/20220112-112013-marostegui.json
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18638 and previous config saved to /var/cache/conftool/dbconfig/20220112-110508-marostegui.json
* 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dborch1001.wikimedia.org
* 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dborch1001.wikimedia.org
* 10:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:59 moritzm: rebalance ganeti/codfw row B (all nodes reimaged to Buster)
* 10:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18637 and previous config saved to /var/cache/conftool/dbconfig/20220112-105650-marostegui.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18636 and previous config saved to /var/cache/conftool/dbconfig/20220112-105540-marostegui.json
* 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18635 and previous config saved to /var/cache/conftool/dbconfig/20220112-105532-marostegui.json
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dbmonitor1002.wikimedia.org
* 10:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: sync on main
* 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply on main
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dbmonitor1002.wikimedia.org
* 10:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply on main
* 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:48 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:47 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply on main
* 10:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 10:41 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18634 and previous config saved to /var/cache/conftool/dbconfig/20220112-104028-marostegui.json
* 10:39 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: sync on main
* 10:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply on main
* 10:37 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18633 and previous config saved to /var/cache/conftool/dbconfig/20220112-103619-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
* 10:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P18632 and previous config saved to /var/cache/conftool/dbconfig/20220112-103144-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18631 and previous config saved to /var/cache/conftool/dbconfig/20220112-102938-marostegui.json
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18630 and previous config saved to /var/cache/conftool/dbconfig/20220112-102523-marostegui.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18629 and previous config saved to /var/cache/conftool/dbconfig/20220112-101018-marostegui.json
* 10:08 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
* 10:06 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
* 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:57 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc1 (duration: 01m 07s)
* 09:54 hnowlan: Decommissioning cassandra instance restbase2009-b via nodetool
* 09:53 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner1001.eqiad.wmnet
* 09:51 moritzm: reverting kubetcd2006 back to "plain" storage
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
* 09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
* 09:51 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner1001.eqiad.wmnet
* 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bullseye
* 09:21 moritzm: reverting kubetcd2005 back to "plain" storage
* 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
* 09:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
* 09:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bullseye
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18628 and previous config saved to /var/cache/conftool/dbconfig/20220112-090959-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc1 (duration: 01m 08s)
* 09:05 marostegui: Reset replication on pc1014
* 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18627 and previous config saved to /var/cache/conftool/dbconfig/20220112-085024-marostegui.json
* 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb1002.eqiad.wmnet
* 08:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb1002.eqiad.wmnet
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18626 and previous config saved to /var/cache/conftool/dbconfig/20220112-083520-marostegui.json
* 08:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1001.eqiad.wmnet
* 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1001.eqiad.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18625 and previous config saved to /var/cache/conftool/dbconfig/20220112-082015-marostegui.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18624 and previous config saved to /var/cache/conftool/dbconfig/20220112-080510-marostegui.json
* 08:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 07:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 07:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: sync on main
* 07:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
* 07:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
* 07:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
* 07:44 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
* 07:41 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
* 07:41 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 07:40 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 07:40 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
* 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
* 07:37 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
* 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
* 07:29 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:28 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18623 and previous config saved to /var/cache/conftool/dbconfig/20220112-072826-marostegui.json
* 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18622 and previous config saved to /var/cache/conftool/dbconfig/20220112-071003-marostegui.json
* 07:02 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
* 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
* 06:58 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
* 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 06:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 06:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
* 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
* 06:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
* 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18621 and previous config saved to /var/cache/conftool/dbconfig/20220112-065458-marostegui.json
* 06:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: sync on main
* 06:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 06:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
* 06:50 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
* 06:49 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
* 06:48 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18620 and previous config saved to /var/cache/conftool/dbconfig/20220112-063953-marostegui.json
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 06:36 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18619 and previous config saved to /var/cache/conftool/dbconfig/20220112-062449-marostegui.json
* 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18618 and previous config saved to /var/cache/conftool/dbconfig/20220112-060923-marostegui.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 for Bullseye reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18617 and previous config saved to /var/cache/conftool/dbconfig/20220112-060803-marostegui.json
* 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 00:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:09 urbanecm: UTC late evening B&C done
* 00:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|24a26392a3e36aa3a46445eb1f87e808b57b19c8}}: Enable Disambiguator notifications for French Wikipedia ([[phab:T293319|T293319]]) (duration: 01m 08s)
* 00:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:03 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)


== 2021-11-25 ==
== 2022-01-11 ==
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17872 and previous config saved to /var/cache/conftool/dbconfig/20211125-204357-ladsgroup.json
* 23:56 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:28 ladsgroup@
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 23:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 23:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 23:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 23:05 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.js: Backport: [[gerrit:753071{{!}}Watchlist API update: Call correct method (T298999)]] (duration: 02m 40s)
* 23:04 dduvall: syncing backport to fix VE regression that followed testwiki/group0 deployment (cc [[phab


== 2021-11-24 ==
== 2022-01-10 ==
* 23:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS buster
* 22:36 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS buster
* 22:34 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
* 23:44 mutante: puppetmaster1001:~] $ sudo puppet cert sign gitlab-runner1001.eqiad.wmnet {{!}}  sudo install_console gitlab-runner1001.eqiad.wmnet ([[phab:T295481|T295481]])
* 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18502 and previous config saved to /var/cache/conftool/dbconfig/20220110-202728-marostegui.json
* 23:26 mutante: ganeti - bringing up new VM - sudo gnt-instance start gitlab-runner1001.eqiad.wmnet ; ran puppet on install1003; installing OS [[phab:T295481|T295481]]
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18501 and previous config saved to /var/cache/conftool/dbconfig/20220110-201224-marostegui.json
* 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS buster
* 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18500
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS buster
* 23:09 mutante: mwmaint1002 - sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size 1M -delete  - to fix Icinga alert about large files in client bucket
* 23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
* 23:03 mutante: wcqs1001 -  sudo systemctl restart wcqs-blazegraph - after <+jinxer-wm> (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wcqs1001:9195 is burning free allocators
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
* 22:50 mutante: Creating a new Ganeti VM and wondering which row to put it? [ganeti1009:~] $ for row in A B C D; do echo "row $<nowiki>{</nowiki>row<nowiki>}</nowiki>: $(sudo gnt-instance list -o name -F


== 2021-11-23 ==
== 2022-01-08 ==
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS buster
* 10:51 elukey: restart hive daemons on an-coord1002 (after my last upgrade/rollback of packages the prometheus agent settings were not picked up, so no metrics)
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2011.codfw.wmnet with OS buster
* 23:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS buster
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS buster
* 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS buster
* 22:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2009.codfw.wmnet with OS buster
* 21:58 tgr: UTC evening deploys done
* 21:57 tgr@deploy1002: Finished scap: (no justification provided) (duration: 10m 03s)
* 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 21:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2009.codfw.wmnet with OS buster
* 21:53 krinkle@deploy1002: Finished deploy [integration/docroot@a3435a7]: (no justification provided) (duration: 00m 07s)
* 21:53 krinkle@deploy1002: Started deploy [integration/docroot@a3435a7]: (no justification provided)
* 21:47 tgr@deploy1002: Started scap: (no justification provided)
* 21:47 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740777{{!}}Add Image: Validate GEInfoboxTemplates size (T294518)]] (duration: 00m 56s)
* 21:39 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Api/ApiQueryGrowthTasks.php: Backport: [[gerrit:740776{{!}}Structured task caching/filtering cherry-picks step 3]] (duration: 00m 55s)
* 21:35 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740775{{!}}Structured task caching/filtering cherry-picks step 2]] (duration: 00m 57s)
* 21:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:04 legoktm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Echo/: re-enable cross-wiki notifications by default ([[phab:T296270|T296270]]) (duration: 00m 57s)
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:51 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|7d5f779a73594bb11f359bda055f2c7af8e92feb}}: Structured task caching/filtering cherry-picks, step 1 (duration: 00m 56s)
* 19:42 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|c26e407118e1cd8e1e3fea6e2f4e3e43a609ea62}}: GrowthExperiments backports (duration: 01m 03s)
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 2/2) (duration: 00m 56s)
* 19:17 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 1/2) (duration: 00m 56s)
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3993aacbfdbbfb6cdcc198ce369bf08b32ace865}}: Increase reading depth sampling rate to .1% ([[phab:T294777|T294777]]) (duration: 00m 57s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:29 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:25 ejegg: updated SmashPig standalone (IPN listener) from {{Gerrit|be68299b}} -> {{Gerrit|211f8e65}}
* 18:18 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:18 cmjohnson1: upgrading msw-c1-eqiad [[phab:T259758|T259758]]
* 18:04 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:01 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:00 moritzm: systemctl reset-failed ifup@ens5.service on durum2001 [[phab:T273026|T273026]]
* 17:59 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:55 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 17:49 mutante: miscweb1002 - rm -rf /srv/deployments/scholarships ([[phab:T243037|T243037]])
* 17:47 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 17:42 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 17:35 ebernhardson: [[phab:T295478|T295478]] start snapshot of commonswiki_file from cirrus codfw -> swift eqiad
* 17:34 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 17:33 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 17:31 cmjohnson1: upgrading msw's  in row D eqiad [[phab:T259758|T259758]]
* 17:28 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS stretch
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2002.codfw.wmnet
* 17:14 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:14 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:11 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2002.codfw.wmnet
* 17:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2001.codfw.wmnet
* 17:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2001.codfw.wmnet
* 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb2002.codfw.wmnet
* 16:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb2002.codfw.wmnet
* 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc2001.codfw.wmnet
* 16:53 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc2001.codfw.wmnet
* 16:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS stretch
* 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2001.codfw.wmnet
* 16:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2001.codfw.wmnet
* 16:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 16:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS stretch
* 16:13 cmjohnson1: updating mgmt switches in row C, racks C2-C8 eqiad [[phab:T259758|T259758]]
* 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 15:46 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2010.codfw.wmnet with OS stretch
* 15:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:27 Emperor: rolling restart of thanos frontends [[phab:T294380|T294380]]
* 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 14:40 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:34 jbond@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=puppetboard
* 14:30 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:00 marostegui: Failover m5 from db1128 to db1132 - [[phab:T288720|T288720]]
* 14:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
* 13:50 godog: powercycle (again) ms-be2058
* 13:48 godog: add 80G to prometheus global in eqiad
* 13:31 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
* 13:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
* 13:01 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 12:52 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1002-dev.eqiad.wmnet
* 12:46 Lucas_WMDE: UTC morning backport+config window done
* 12:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:43 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1002-dev.eqiad.wmnet
* 12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:737503{{!}}Set up beta test environment for QuickSurveys (T293798)]] (beta only) (duration: 00m 55s)
* 12:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740784{{!}}OSD: Handle cases where the image srcset attr is not set (T296260)]] (duration: 00m 56s)
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:26 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740778{{!}}OSD: Add a ready hook for scripts (T180569)]] (duration: 00m 56s)
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 11:54 btullis@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 11:51 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart (exit_code=97) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:51 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2002.codfw.wmnet
* 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2002.codfw.wmnet
* 11:25 godog: powercycle ms-be2058 - down and nothign on console
* 11:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5012.eqsin.wmnet with OS buster
* 11:15 vgutierrez: pool cp5012 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 Amir1: start of mwscript migrateRevisionActorTemp.php --wiki=testwiki --sleep=5 ([[phab:T275246|T275246]])
* 11:05 jayme: cordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:05 jayme: uncordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:740807{{!}}Set test wikis to write both for actor temp table migration (T275246)]] (duration: 00m 56s)
* 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17800 and previous config saved to /var/cache/conftool/dbconfig/20211123-102234-ladsgroup.json
* 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:19 urbanecm@deploy1002: Finished scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates (duration: 11m 06s)
* 10:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:08 urbanecm@deploy1002: Started scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates
* 10:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5012.eqsin.wmnet with OS buster
* 10:01 vgutierrez: depool cp5012 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:57 jayme: cordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet - [[phab:T293729|T293729]]
* 09:52 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1124.eqiad.wmnet with OS bullseye
* 09:27 Amir1: dropping useless GRANTs on s6 eqiad replicas without replication ([[phab:T296274|T296274]])
* 09:16 Amir1: dropping useless GRANTs on s6 eqiad master without replication ([[phab:T296274|T296274]])
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bullseye
* 09:05 Amir1: fixing incorrect grants of wikiadmin on localhost in s6 master in codfw with replication
* 07:52 topranks: Adjusting BGP on cr1-eqiad and cr2-eqiad to keep MED unchanged in iBGP.
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 05:29 ryankemper: [[phab:T295705|T295705]] Downtimed `elastic2044` for one hour and doing a full reboot for good measure. Already ran the plugin upgrade: `DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins`
* 05:26 ryankemper: [[phab:T295705|T295705]] Rolling restart of `codfw` complete. `elastic2044` was manually restarted earlier today so the cookbook didn't restart it (b/c we pass in a datetime cutoff threshold) so I'm manually upgrading and restarting that host
* 05:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 04:17 ryankemper: [[phab:T295705|T295705]] Properly disabled the sane-itizer; we don't want it running until after we (a) complete rolling restarts and (b) restore the missing `commonswikI_file` index (which is blocked on the restarts)
* 03:42 Amir1: ladsgroup@mwmaint1002:~$ cat broken_imgs {{!}} xargs -I <nowiki>{</nowiki><nowiki>}</nowiki> mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --verbose --mime 'image/*' --force --batch-size 1 --sleep 1 --start=<nowiki>{</nowiki><nowiki>}</nowiki> --end=<nowiki>{</nowiki><nowiki>}</nowiki> ([[phab:T296001|T296001]])
* 03:37 Amir1: rebuilding metadata of all djvu files outside of commons ([[phab:T296001|T296001]])
* 03:06 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:58 ryankemper: [[phab:T295705|T295705]] `elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.codfw.wmnet', port=9243): Read timed out. (read timeout=60))` Probably transient failure; will wait 10 mins and try again
* 02:57 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:55 ryankemper: [[phab:T295705|T295705]] `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation codfw "codfw plugin upgrade + restart" --upgrade --nodes-per-run 2 --start-datetime 2021-11-18T18:55:54 --task-id [[phab:T295705|T295705]]` on tmux `rolling_restarts_codfw`
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:17 urbanecm: UTC late window done
* 01:17 urbanecm@deploy1002: Finished scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4) (duration: 25m 50s)
* 01:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:51 urbanecm@deploy1002: Started scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4)
* 00:50 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/autoload.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 3/4) (duration: 00m 55s)
* 00:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specials/: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 2/4) (duration: 00m 55s)
* 00:48 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specialpage/SpecialPageFactory.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 1/4) (duration: 00m 56s)
* 00:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9209433dfc8b1f81a165ec75867337800db24b1}}: Enable reading depth instrumentation at low sampling rate ([[phab:T294777|T294777]]) (duration: 00m 56s)
* 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:30 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents: {{Gerrit|3f860c7}}: {{Gerrit|fa9fbf1}}: WikimediaEvents bbackports (2/2; [[phab:T294777|T294777]]) (duration: 00m 55s)
* 00:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/extension.json: {{Gerrit|3f860c72bca817c40486b90f0d8e0ffca72b2690}}: Restore ReadingDepth instrument (1/2) (duration: 00m 56s)
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/739908


== 2021-11-22 ==
== 2022-01-07 ==
* 23:55 mutante: acmechief1001, acmechief-test1001: sudo systemctl restart reload-acme-chief-backend.timer
* 22:07 eileen: config revision changed from {{Gerrit|3df415c1}} to {{Gerrit|ecf09aa0}} - disable eoy email jobs
* 23:54 mutante: acmechief1001, acmechief-test1001: sudo systemctl start reload-acme-chief-backend.timer
* 20:08 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/<nowiki>{</nowiki>zhwikinews,zhwikinews-1.5x,zhwikinews-2x,zhwikinews-hans,zhwikinews-hans-1.5x,zhwikinews-hans-2x<nowiki>}</nowiki>.png via purgeList.php
* 23:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2011.codfw.wmnet with OS stretch
* 19:49 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host apifeatureusage2001.codfw.wmnet
* 23:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2010.codfw.wmnet with OS stretch
* 19:41 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host apifeatureusage1001.eqiad.wmnet
* 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 19:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS bullseye
* 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS stretch
* 19:21 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host apifeatureusage2001.codfw.wmnet
* 22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 19:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye
* 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS stretch
* 19:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS bullseye
* 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS buster
* 19:11 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host apifeatureusage1001.eqiad.wmnet
* 21:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS buster
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye
* 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS buster
* 15:18 taavi: reset email address for Ollie Shotton developer account per [[phab:T298779|T298779]]
* 21:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS buster
* 15:08 ottomata: creeating mediainfo-streaming-updater.mutation topics on kafka main-eqiad and main-codfw and setting retention to 30 days - [[phab:T296470|T296470]]
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:05 ema: upgrade varnish on deployment-cache-text06 to 6.0.9 [[phab:T298758|T298758]]
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:23 legoktm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Lower CirrusSearch maxqueues to be closer to number of workers (duration: 00m 56s)
* 12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:01 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 12:14 taavi@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/ProofreadPage/modules/page: Backport: [[gerrit:751843{{!}}Makes sure $imgContHorizontal is always initialized (T298694)]] (duration: 00m 59s)
* 19:49 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 11:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:46 urbanecm: Evening B&C window completed
* 11:56 taavi@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/Flow: Backport: [[gerrit:752014{{!}}Revert "Use strict equality when safe to do so" (T298760)]] (duration: 01m 00s)
* 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/: {{Gerrit|10b8440069ac71434274462c545c6b2b2c9182d9}}: Use the WikiEditor ready hook instead of using() the lib ([[phab:T296033|T296033]]) (duration: 00m 56s)
* 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:40 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:33 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 19:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b6b05e30b3c9b4007fd31ab0698507d7a48d1caf}}: kswiki: set wgTranslateNumerals to false ([[phab:T296055|T296055]]) (duration: 00m 55s)
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18413 and previous config saved to /var/cache/conftool/dbconfig/20220107-072742-marostegui.json
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18412 and previous config saved to /var/cache/conftool/dbconfig/20220107-071237-marostegui.json
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4aa8d5bf465bfc3fee2ec547718af0c779f88ef4}}: Enable SandboxLink on lawiki ([[phab:T296073|T296073]]) (duration: 00m 56s)
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18411 and previous config saved to /var/cache/conftool/dbconfig/20220107-065733-marostegui.json
* 19:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c082bec4c74c156b26af4349488835902c5bacd}}: Enable mapframe on the Indonesian Wikipedia ([[phab:T295571|T295571]]) (duration: 00m 56s)
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18410 and previous config saved to /var/cache/conftool/dbconfig/20220107-064228-marostegui.json
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18409 and previous config saved to /var/cache/conftool/dbconfig/20220107-064119-marostegui.json
* 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 19:05 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 19:01 vgutierrez: pool cp4032 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 18:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 17:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 17:48 XioNoX: repool codfw
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 17:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4032.ulsfo.wmnet with OS buster
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 17:46 ejegg: updated fundraising python tools from {{Gerrit|d90f4c91}} -> {{Gerrit|d1d7b100}}
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2089.codfw.wmnet with reason: Maintenance
* 17:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2089.codfw.wmnet with reason: Maintenance
* 17:32 ebernhardson: restart both elasticsearch instances on elastic2044, reporting `connection refused` (after a brief period of `no route to host`) to masters even though the connection works outside elastic
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db[2076,2095].codfw.wmnet with reason: Maintenance
* 17:01 ryankemper: [[phab:T295705|T295705]] Beginning rolling restart w/ plugin upgrade of `cloudelastic`: `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic plugin upgrade + restart" --upgrade --nodes-per-run 3 --start-datetime 2021-11-22T16:59:38 --task-id [[phab:T295705|T295705]]` on tmux `rolling_restarts_cloudelastic`
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db[2076,2095].codfw.wmnet with reason: Maintenance
* 17:00 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
* 05:47 marostegui: rename wikishared.wikimedia_editor_tasks_targets_passed on db1120 [[phab:T264225|T264225]]
* 16:58 ryankemper: [Elastic] [[phab:T295705|T295705]] Rolling restart w/ plugin upgrade of `relforge` is complete
* 00:23 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752036{{!}}viwiktionary: add namespaces "Appendix" and "Appendix talk" (T298289)]] (duration: 00m 59s)
* 16:55 ryankemper: [Elastic] [[phab:T295705|T295705]] Restarting second and final relforge host: `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
* 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4032.ulsfo.wmnet with OS buster
* 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:52 ryankemper: [Elastic] [[phab:T295705|T295705]] Restarting first relforge host: `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
* 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:51 jayme: fleet wide updated wmf-certificates to 0~20211122-1
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:50 vgutierrez: depol cp4032 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 16:49 ryankemper: [Elastic] [[phab:T295705|T295705]] Downtimed relforge* for 2 hours in order to performing a manual rolling restart of the two hosts `relforge1003` and `relforge1004`
* 16:44 ryankemper: [[phab:T295705|T295705]] Upgrading `relforge` elasticsearch packages: `ryankemper@cumin1001:~$ sudo cumin -b 2 'relforge*' 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins'`
* 16:39 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 16:15 urbanecm: Password reset for Miraki@arbcom_dewiki per private request
* 16:15 moritzm: installing postgresql-13 security updates on bullseye
* 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:55 XioNoX: Telia DDoS auto-mitigation enabled on all circuits - [[phab:T288926|T288926]]
* 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:28 Amir1: revoking DROP for wikiadmin from db1100 ([[phab:T249683|T249683]])
* 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
* 15:17 moritzm: set kvm:machine_version=pc-i440fx-2.8 for Ganeti cluster in codfw [[phab:T294119|T294119]]
* 15:16 jayme: imported wmf-certificates 0~20211122-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 15:13 _joe_: restarting pybal low-traffic in codfw, eqiad
* 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:58 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.wikimedia.org
* 14:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734426{{!}}Disable DPL on opt-in wikis where not in use (T287916)]] (duration: 00m 56s)
* 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734425{{!}}Disable DPL on Wikiversities where not in use (T287916)]] (duration: 00m 56s)
* 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734424{{!}}Disable DPL on Wikisources where not in use (T287916)]] (duration: 00m 56s)
* 14:44 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.wikimedia.org
* 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:06 akosiaris: repool wtp1025, wtp1041 to parsoid cluster. [[phab:T296098|T296098]]
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
* 13:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
* 13:32 XioNoX: re-enable pybal on lvs2007 - [[phab:T295118|T295118]]
* 13:31 XioNoX: re-enable puppet on lvs2007
* 13:30 XioNoX: re-enabling V6 between cr2-codfw and asw-b-codfw - [[phab:T295118|T295118]]
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9
* 13:04 XioNoX: asw-b-codfw# set virtual-chassis member 7 mastership-priority 255 - [[phab:T295118|T295118]]
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:51 Lucas_WMDE: UTC morning backport+config window done
* 12:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: [[gerrit:740556{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (2/2) (duration: 01m 03s)
* 12:49 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: [[gerrit:740556{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (1/2) (duration: 01m 04s)
* 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: [[gerrit:740558{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (2/2) (duration: 01m 03s)
* 12:45 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: [[gerrit:740558{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (1/2) (duration: 01m 04s)
* 12:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: 1.37.0 is out now, so there's no beta [[phab:T289585|T289585]] (duration: 01m 04s)
* 12:11 hashar@deploy1002: Synchronized php-1.38.0-wmf.9/skins/MinervaNeue: Fix banners to show CentralNotice - [[phab:T296077|T296077]] (duration: 01m 04s)
* 11:50 moritzm: installing qemu security updates on bullseye
* 11:46 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:43 moritzm: installing krb5 security updates on stretch
* 11:41 oblivian@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 oblivian@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:36 oblivian@cumin1001: START - Cookbook sre.dns.netbox
* 11:34 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 11:20 XioNoX: re-enable LibertyGlobal in esams
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 11:12 XioNoX: Revert "prepend_as_out for esams/knams"
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2003.codfw.wmnet with OS buster
* 10:54 elukey: apt-get purge up to linux-image-4.9.0-14-amd64 on sodium to free /boot space
* 10:49 elukey: `apt-get remove linux-image-4.9.0-5-amd64 linux-image-4.9.0-6-amd64` on sodium to free /boot
* 10:45 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2003.codfw.wmnet with OS buster
* 10:25 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 10:16 jbond: restart snmp gracefully cr2-eqord
* 10:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 09:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 09:35 moritzm: installing Linux 4.9.272 updates on Stretch hosts
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:24 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|24b3a7769ca97e3ed951d77d911f41afae5e4136}}: Growth: Disable filtering by unstarred mentees at arwiki, enwiki, fawiki ([[phab:T293182|T293182]]) (duration: 01m 04s)
* 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:08 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 09:05 moritzm: installing 4.19.208-1 kernels on Stretch hosts with 4.19 kernels
* 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:49 moritzm: drain ganeti-test2003 for forthcoming reimage
* 08:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 08:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|4418c4367b7420139cd8b30cb003d697b58c618f}}: ApiSetMentorStatus: Use READ_LATEST to request back timestamp ([[phab:T295305|T295305]]) (duration: 01m 08s)
* 08:42 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17793 and previous config saved to /var/cache/conftool/dbconfig/20211122-082525-root.json
* 08:15 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17792 and previous config saved to /var/cache/conftool/dbconfig/20211122-081022-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17791 and previous config saved to /var/cache/conftool/dbconfig/20211122-075518-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17790 and previous config saved to /var/cache/conftool/dbconfig/20211122-074015-root.json
* 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17789 and previous config saved to /var/cache/conftool/dbconfig/20211122-072511-root.json
* 07:17 Amir1: running optimize table on image table in commonswiki on codfw with replication enabled, it'll cause replication lag ([[phab:T296143|T296143]])
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17788 and previous config saved to /var/cache/conftool/dbconfig/20211122-071006-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17787 and previous config saved to /var/cache/conftool/dbconfig/20211122-065502-root.json
* 06:46 marostegui: Revoke dump grants for scholarships database [[phab:T296166|T296166]]
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17786 and previous config saved to /var/cache/conftool/dbconfig/20211122-063959-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17785 and previous config saved to /var/cache/conftool/dbconfig/20211122-062455-root.json
* 03:30 Amir1: run optimize table on db2140 for image table ([[phab:T296143|T296143]])


== 2021-11-21 ==
== 2022-01-06 ==
* 13:17 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 10h)
* 23:52 jhathaway: bouncing blazegraph on wdqs1004
* 07:26 XioNoX: cr1-eqiad# deactivate protocols bgp group Confed_eqord
* 23:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-coord1002.eqiad.wmnet with OS buster
* 05:22 Amir1: running clean up of djvu files in all wikis ([[phab:T275268|T275268]])
* 22:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS buster
* 05:13 Amir1: end of djvu metadata maint script run ([[phab:T275268|T275268]])
* 22:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 22:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 22:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 22:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 22:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.16  refs [[phab:T293958|T293958]]
* 22:14 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/Scribunto/: sync Scribunto to deploy https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Scribunto/+/752006/ (duration: 01m 08s)
* 22:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 22:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:23 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3297991]: update rdf-spark-tools jar to 0.3.98 (duration: 02m 15s)
* 20:21 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3297991]: update rdf-spark-tools jar to 0.3.98
* 20:19 inflatador: banned elastic2051 from both chi and omega search clusters - [[phab:T298674|T298674]]
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:01 twentyafterfour@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/751841 (duration: 01m 08s)
* 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:59 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@63c162d]: generate entity revision maps for commons / wcqs (duration: 02m 07s)
* 19:57 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@63c162d]: generate entity revision maps for commons / wcqs
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:07 taavi: UTC evening deploys done
* 19:05 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751538{{!}}Add data.nhm.ac.uk to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T298451)]] (duration: 01m 09s)
* 19:02 razzi: systemctl restart haproxy on dbproxy1018 to repool clouddb1018 for [[phab:T298505|T298505]]
* 18:59 mutante: puppetmaster1001 - creating missing Icinga contact for jgleeson in private puppet repo [[phab:T298649|T298649]]
* 18:51 mutante: contint1001 - after contint2001 also re-enabled puppet and deployed 751816 zuul-merger refactor - service git-daemon refreshed and runnning
* 18:50 razzi: run sudo maintain-views --databases centralauth --replace-all on clouddb1018 for [[phab:T298505|T298505]]
* 18:47 mutante: contint* - deploying zuul-merger puppet refactor change, first codfw-only
* 18:00 btullis@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 00m 09s)
* 18:00 btullis@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
* 17:45 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@6f5caf9]: allow for null columns in export to relforge (duration: 02m 11s)
* 17:42 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@6f5caf9]: allow for null columns in export to relforge
* 16:42 otto@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 00m 34s)
* 16:41 otto@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
* 16:37 inflatador: restarting elastic2052 for configuration change - [[phab:T298674|T298674]]
* 16:33 taavi: reset wikitech email for User:Iniquity per [[phab:T298683|T298683]]
* 16:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:21 taavi@deploy1002: Synchronized wmf-config/wikitech.php: wikitech: Re-enable Phabricator and Gerrit users after unblock (duration: 01m 09s)
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:19 btullis@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 00m 41s)
* 16:18 btullis@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
* 16:18 btullis@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 07m 16s)
* 16:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:10 btullis@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
* 16:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
* 15:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
* 13:51 jbond: deploy cfssl_1.6.1-0+deb9u1_amd64 to stretch systems
* 09:57 hashar: Restarting zuul-merger on contint2001 and contint1001 {{!}} https://gerrit.wikimedia.org/r/c/operations/puppet/+/738370/ {{!}} [[phab:T187897|T187897]]
* 07:06 Amir1: revoke DROP from wikiadmin globally
* 02:34 eileen: civicrm revision changed from {{Gerrit|67264062}} to {{Gerrit|3d334f30}}
* 00:32 dancy@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:751530{{!}}Change the Traditional Chinese and Simplified Chinese logo for zhwikinews (T298550)]] (duration: 01m 07s)
* 00:30 dancy@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:751530{{!}}Change the Traditional Chinese and Simplified Chinese logo for zhwikinews (T298550)]] (duration: 01m 07s)
* 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn


== 2021-11-20 ==
== 2022-01-05 ==
* 01:02 mutante: lists1001 - restarted apache, icinga alerts for the web UI, but recovered
* 23:50 razzi: sudo systemctl reload haproxy on dbproxy1019 to repool clouddb1014 for [[phab:T298505|T298505]]
* 00:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:26 razzi: run sudo maintain-views --databases centralauth --debug --replace-all on clouddb1014 for [[phab:T298505|T298505]]
* 00:26 cdanis@cumin1001: START - Cookbook sre.network.cf
* 22:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:25 bblack: lvs3005 - re-enabling puppet + pybal
* 22:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:25 legoktm@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:25 legoktm@cumin1001: START - Cookbook sre.network.cf
* 22:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:24 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:23 cdanis@cumin1001: START - Cookbook sre.network.cf
* 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:06 bblack: lvs3005 - disabling puppet and stopping pybal (traffic will go to lvs3007)
* 21:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 21:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 21:25 eileen: civicrm revision {{Gerrit|32d7370a}} -> {{Gerrit|67264062}}
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:39 razzi: reload haproxy on dbproxy1019 to repool clouddb1014 for [[phab:T298505|T298505]]
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:25 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.16  refs [[phab:T293957|T293957]] (duration: 01m 07s)
* 20:23 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.16  refs [[phab:T293957|T293957]]
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:14 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.16  refs [[phab:T293957|T293957]]
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:11 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/includes/changetags/ChangeTags.php: unblock the train, refs [[phab:T293957|T293957]] (duration: 01m 09s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:47 urbanecm@deploy1002: Finished scap: {{Gerrit|485e72bada5243755daab981f5a9ecd35e5b134e}}: Add it namespace aliases in scn ([[phab:T297844|T297844]]) (duration: 11m 40s)
* 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:41 razzi: reload haproxy on dbproxy1019 (previously incorrectly reloaded dbproxy1018) for [[phab:T298505|T298505]]
* 19:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:35 urbanecm@deploy1002: Started scap: {{Gerrit|485e72bada5243755daab981f5a9ecd35e5b134e}}: Add it namespace aliases in scn ([[phab:T297844|T297844]])
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f2da5befc75b4f93ca4a11393a533b7dc97316ef}}: Deploy sticky header ([[phab:T295976|T295976]]) (duration: 01m 42s)
* 19:31 razzi: reload haproxy on dbproxy1018 for [[phab:T298505|T298505]]
* 19:27 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/skins/Vector/resources/skins.vector.es6/stickyHeader.js: {{Gerrit|f6424f32611bce8d9e95c369c28e2f787e2cdf75}}: Dont use ts-ignore. It is hiding real errors ([[phab:T297119|T297119]]) (duration: 01m 08s)
* 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|aff4ac32f37d21ac0b70c62adc54756eb1e2d2b0}}: Add www.artsobservasjoner.no to the wgCopyUploadsDomains allowlist of Commons ([[phab:T298449|T298449]]) (duration: 01m 08s)
* 19:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:47 jgleeson: localsettings changed from {{Gerrit|2d371ed1}} to {{Gerrit|3df415c1}}
* 18:22 bd808: Toolhub: ran `poetry run ./manage.py migrate` against m5-master
* 18:18 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync on main
* 18:16 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply on main
* 18:07 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync on main
* 18:06 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply on main
* 18:04 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync on main
* 18:03 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply on main
* 18:03 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync on main
* 18:02 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply on main
* 18:02 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply on main
* 17:57 btullis@deploy1002: Finished deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to production (duration: 00m 29s)
* 17:57 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to production
* 17:55 andrew@deploy1002: Finished deploy [horizon/deploy@b300fa6]: minor code format update (duration: 04m 09s)
* 17:53 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: sync on main
* 17:51 andrew@deploy1002: Started deploy [horizon/deploy@b300fa6]: minor code format update
* 17:50 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply on main
* 17:48 btullis@deploy1002: Finished deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging (duration: 00m 39s)
* 17:47 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging
* 17:46 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: sync on main
* 17:46 btullis@deploy1002: Finished deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging (duration: 03m 11s)
* 17:42 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging
* 17:42 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment for something important
* 17:36 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply on main
* 17:26 andrew@deploy1002: Finished deploy [horizon/deploy@15efe04]: sudo panel update (duration: 04m 00s)
* 17:21 andrew@deploy1002: Started deploy [horizon/deploy@15efe04]: sudo panel update
* 17:21 andrew@deploy1002: Finished deploy [horizon/deploy@15efe04]: sudo panel update (codfw1dev) (duration: 01m 54s)
* 17:19 andrew@deploy1002: Started deploy [horizon/deploy@15efe04]: sudo panel update (codfw1dev)
* 17:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:11 sbassett: Deployed security fix for [[phab:T298581|T298581]] to wmf.16
* 17:04 sbassett@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/MobileFrontend/includes/specials/SpecialMobileContributions.php: Deploy security fix for [[phab:T298581|T298581]] (duration: 01m 08s)
* 16:51 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:51 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:38 andrew@deploy1002: Finished deploy [horizon/deploy@5e57e78]: sudo panel update (codfw1dev) (duration: 02m 08s)
* 16:36 andrew@deploy1002: Started deploy [horizon/deploy@5e57e78]: sudo panel update (codfw1dev)
* 16:27 andrew@deploy1002: Finished deploy [horizon/deploy@5e57e78]: sudo panel update (duration: 03m 53s)
* 16:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:26 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:23 andrew@deploy1002: Started deploy [horizon/deploy@5e57e78]: sudo panel update
* 14:54 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:50 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3316, db2087:3317 after reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18402 and previous config saved to /var/cache/conftool/dbconfig/20220105-134827-marostegui.json
* 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2001.codfw.wmnet with OS bullseye
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:38 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/TrustedXFF/: {{Gerrit|ce7113b99712ac7ce4112cff720c669f618df6eb}}: Add more Zscaler ranges ([[phab:T298241|T298241]]) (duration: 01m 09s)
* 13:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/TrustedXFF/: {{Gerrit|d35e36f4deb7a8e2a454769f4b2d72e45318fcc9}}: Add more Zscaler ranges ([[phab:T298241|T298241]]) (duration: 01m 09s)
* 13:33 Amir1: delete echo keys from objectchange in frwiki ([[phab:T272512|T272512]])
* 13:23 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
* 13:22 jelto@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
* 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2001.codfw.wmnet with OS bullseye
* 13:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2002.codfw.wmnet with OS bullseye
* 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2002.codfw.wmnet with OS bullseye
* 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:18 taavi: UTC morning deploys done
* 12:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:749890{{!}}Add akwiki as an import source for twwiki (T298296)]] (duration: 01m 09s)
* 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:07 vgutierrez: pool cp5005 running envoyproxy as TLS terminator - [[phab:T271421|T271421]]
* 12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2003.codfw.wmnet with OS bullseye
* 11:56 jbond: rollout cfssl 1.6.1
* 11:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
* 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5005.eqsin.wmnet with OS buster
* 11:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
* 11:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
* 11:34 aokoth@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubestage1002.eqiad.wmnet
* 11:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2003.codfw.wmnet with OS bullseye
* 11:24 btullis: updating hive packages in reprepro for log4j update
* 11:24 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubestage1002.eqiad.wmnet
* 11:20 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2003.codfw.wmnet with OS bullseye
* 10:54 jbond: upload cfssl 1.6.1
* 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2003.codfw.wmnet with OS bullseye
* 10:48 hashar: CI: switching MediaWiki selenium from php built-in server to Apache # https://gerrit.wikimedia.org/r/751697
* 10:40 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:37 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 10:02 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8137ffc33d9de0f0a835223936a93e87504a7358}}: pwnwiki: Enable Growth features in dark mode ([[phab:T298115|T298115]]; 3/3) (duration: 01m 07s)
* 10:00 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:59 urbanecm@deploy1002: Synchronized wmf-config/config/pwnwiki.yaml: {{Gerrit|8137ffc33d9de0f0a835223936a93e87504a7358}}: pwnwiki: Enable Growth features in dark mode ([[phab:T298115|T298115]]; 2/3) (duration: 01m 07s)
* 09:59 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:58 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|8137ffc33d9de0f0a835223936a93e87504a7358}}: pwnwiki: Enable Growth features in dark mode ([[phab:T298115|T298115]]; 1/3) (duration: 01m 07s)
* 09:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:53 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/GrowthExperiments/includes/Mentorship/Hooks/MentorFilterHooks.php: {{Gerrit|24e15e1fd5c7feb2377974ee666c61aef8f82da5}}: MentorFilterHooks: Include only primary mentors ([[phab:T298031|T298031]]) (duration: 01m 07s)
* 09:48 aokoth@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubestage1001.eqiad.wmnet
* 09:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/TrustedXFF/trusted-hosts.php: {{Gerrit|ab8fe9884e3e4d1fa3bdaa1c8a9cab143b4ac565}}: Add Zscaler to list of trusted hosts for XFF ([[phab:T298241|T298241]]) (duration: 01m 08s)
* 09:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/TrustedXFF/trusted-hosts.php: {{Gerrit|010d96b9297825079b3ac84f247c0f80353d42a8}}: Add Zscaler to list of trusted hosts for XFF ([[phab:T298241|T298241]]) (duration: 01m 09s)
* 09:33 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubestage1001.eqiad.wmnet
* 09:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5005.eqsin.wmnet with OS buster
* 09:24 vgutierrez: depool cp5005 to be reimaged as cache::upload_envoy - [[phab:T271421|T271421]]
* 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2087.codfw.wmnet with OS bullseye
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2087.codfw.wmnet with OS bullseye
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for Buster reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18399 and previous config saved to /var/cache/conftool/dbconfig/20220105-082529-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18398 and previous config saved to /var/cache/conftool/dbconfig/20220105-081600-marostegui.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P18397 and previous config saved to /var/cache/conftool/dbconfig/20220105-080055-marostegui.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P18396 and previous config saved to /var/cache/conftool/dbconfig/20220105-074551-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18395 and previous config saved to /var/cache/conftool/dbconfig/20220105-073046-marostegui.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18394 and previous config saved to /var/cache/conftool/dbconfig/20220105-072937-marostegui.json
* 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 02:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:13 Amir1: running foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=OFFICE --oldimage ([[phab:T298417|T298417]])
* 02:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:11 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/maintenance/refreshImageMetadata.php: Backport: [[gerrit:751526{{!}}maintenance: Add support for oldimage table metadata refresh (T298417)]] (duration: 01m 07s)
* 02:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:09 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.13/maintenance/refreshImageMetadata.php: Backport: [[gerrit:751527{{!}}maintenance: Add support for oldimage table metadata refresh (T298417)]] (duration: 01m 08s)
* 01:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:49 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:750814{{!}}Delete Tematica namespace (NS:104) in Italian Wikivoyage (T298315)]] (duration: 01m 07s)
* 01:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:43 ebernhardson@deploy1002: Synchronized static/images/mobile/copyright/wikivoyage-wordmark-bn.svg: Config: [[gerrit:749626{{!}}Update bnwikivoyage wordmark logo (T298033)]] (duration: 01m 07s)
* 01:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:41 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:749626{{!}}Update bnwikivoyage wordmark logo (T298033)]] (duration: 01m 07s)
* 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:12 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751485{{!}}Move CirrusSearch more_like traffic to eqiad]] (duration: 01m 07s)
* 01:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|34bf91ec2ba1408594bb77745deb6fa7d36ddf8d}}: GrowthExperiments: Add campaign pattern for JOSA ([[phab:T298057|T298057]]) (duration: 01m 08s)
* 00:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7aff17f42eb2ecad94a76c5d93ce467bd6bff39e}}: Fix wordmark svgs for strategywiki, viwikibooks ([[phab:T290091|T290091]]; 2/2) (duration: 01m 07s)
* 00:52 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|7aff17f42eb2ecad94a76c5d93ce467bd6bff39e}}: Fix wordmark svgs for strategywiki, viwikibooks ([[phab:T290091|T290091]]; 1/2) (duration: 01m 07s)
* 00:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6c220f0bb86b0d77714ee23d662ea836897e0207}}: Enable slow-parsoid logs (duration: 01m 08s)
* 00:40 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/includes/content/ContentModelChange.php: fix patch application failure (duration: 01m 07s)
* 00:37 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/VisualEditor/: fix patch application failure (duration: 01m 09s)


== 2021-11-19 ==
== 2022-01-04 ==
* 23:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2005.codfw.wmnet with OS bullseye
* 22:55 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.16  refs [[phab:T293957|T293957]] (duration: 37m 56s)
* 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 22:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 23:24 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2005.codfw.wmnet with OS bullseye
* 22:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 23:15 mutante: LDAP - added mmartorana to wmf (91354e9e-5706-4289-9a60-{{Gerrit|98e8a7632853}}) [[phab:T295789|T295789]]
* 22:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 22:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS stretch
* 22:17 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.16  refs [[phab:T293957|T293957]]
* 20:21 mutante: phabricator - adding eigyan to WMF-NDA (phab projectt 61 - https://phabricator.wikimedia.org/project/members/61/ ) - since that is now standard when adding people to the wmf LDAP group ([[phab:T295928|T295928]])
* 21:15 eileen: process-control checkout revision ({{Gerrit|e58e4e50}} -> {{Gerrit|eb83f208}})
* 20:20 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2002.codfw.wmnet
* 21:02 eileen: process-control config {{Gerrit|40467fc2}} -> {{Gerrit|e58e4e50}}
* 20:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:51 mutante: shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert
* 20:43 eileen: config {{Gerrit|b26653a4}} -> {{Gerrit|40467fc2}} (latest)
* 19:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 20:34 eileen: civicrm revision {{Gerrit|aaceb4ab}} -> {{Gerrit|328c8542}}
* 18:10 andrew@deploy1002: Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s)
* 20:33 twentyafterfour_: MediaWiki train for 1.38.0-wmf.16 - ran `scap prep` [[phab:T293957|T293957]]
* 18:06 andrew@deploy1002: Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone
* 16:57 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@b38fb58]: Switch mjolnir norm_query_clustering to the shsaded refinery jar (duration: 02m 11s)
* 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:55 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@b38fb58]: Switch mjolnir norm_query_clustering to the shsaded refinery jar
* 17:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18388 and previous config saved to /var/cache/conftool/dbconfig/20220104-160930-marostegui.json
* 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s)
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18387 and previous config saved to /var/cache/conftool/dbconfig/20220104-155425-marostegui.json
* 17:21 andrew@deploy1002: Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing
* 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18386 and previous config saved to /var/cache/conftool/dbconfig/20220104-153920-marostegui.json
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18384 and previous config saved to /var/cache/conftool/dbconfig/20220104-152416-marostegui.json
* 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:07 aokoth@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:42 thcipriani@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] [[phab:T296098|T296098]]"
* 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:35 thcipriani: rolling back to group0 for [[phab:T296098|T296098]]
* 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:20 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:21 oblivian@deploy1002: Synchronized docroot: Config: Make symlinks relative so they work on a local checkout too ([[phab:T285232|T285232]]) (duration: 00m 57s)
* 15:31 akosiaris: roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:29 akosiaris: depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again.
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
* 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:12 oblivian@deploy1002: Synchronized images: Config: Remove dead symlinks ([[phab:T285232|T285232]]) (duration: 00m 58s)
* 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS buster
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:15 jayme: fleet wide updated wmf-certificates to 0~20211119-1
* 13:57 godog: bump prometheus k8s + ops space in eqiad
* 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS buster
* 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
* 13:23 moritzm: draining instances from ganeti-test2001 for reimage [[phab:T284811|T284811]]
* 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
* 13:02 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18382 and previous config saved to /var/cache/conftool/dbconfig/20220104-134410-marostegui.json
* 12:10 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1121,1155].eqiad.wmnet with reason: Maintenance
* 12:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1121,1155].eqiad.wmnet with reason: Maintenance
* 11:54 hnowlan: roll-restarting cassandra on eqiad maps for java updates
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18381 and previous config saved to /var/cache/conftool/dbconfig/20220104-134359-marostegui.json
* 11:36 jayme: imported wmf-certificates 0~20211119-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18380 and previous config saved to /var/cache/conftool/dbconfig/20220104-132854-marostegui.json
* 09:53 XioNoX: run `commit full` on asw-b-codfw - [[phab:T295118|T295118]]
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18379 and previous config saved to /var/cache/conftool/dbconfig/20220104-131349-marostegui.json
* 09:30 XioNoX: re-enable cr2-codfw<->asw-b7-codfw link after disabling inet6 on cr2-codfw:ae2 - [[phab:T295118|T295118]]
* 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 08:46 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18378 and previous config saved to /var/cache/conftool/dbconfig/20220104-130816-marostegui.json
* 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18377 and previous config saved to /var/cache/conftool/dbconfig/20220104-125845-marostegui.json
* 08:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
* 12:53 taavi: UTC morning deploys done
* 08:29 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18376 and previous config saved to /var/cache/conftool/dbconfig/20220104-125312-marostegui.json
* 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:52 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751385{{!}}prod: WRITE_BOTH for centralauth hidden level migration (T289068)]] (duration: 00m 57s)
* 08:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes: Backport: [[gerrit:739841{{!}}Revert "Title: use PageStore instead of LinkCache"]] (duration: 01m 03s)
* 12:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:23 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 25s)
* 12:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:22 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
* 12:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:17 moritzm: installing mariadb-10.5 security updates on bullseye (as packaged in Debian, not the wmf-internal packages)
* 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:55 marostegui: Reboot db1132 to pick up new kernel [[phab:T288720|T288720]]
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s2 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18375 and previous config saved to /var/cache/conftool/dbconfig/20220104-123845-marostegui.json
* 06:23 marostegui: Upgrade clouddb1019
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18374 and previous config saved to /var/cache/conftool/dbconfig/20220104-123807-marostegui.json
* 05:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 04:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 04:55 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/media/DjVuImage.php: Backport: [[gerrit:739838{{!}}media: Store metadata of one-page documents correctly (T296001)]] (duration: 00m 56s)
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:54 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/modules: Backport: [[gerrit:739837{{!}}Lazy-load structured task JS files (T296049)]] (duration: 00m 55s)
* 12:34 taavi@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/LdapAuthentication/includes/LdapAuthenticationPlugin.php: Backport: [[gerrit:751192{{!}}Include ldap errno on account creation debug logs (T298508)]] (duration: 00m 58s)
* 02:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:02 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 12:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:01 mutante: [puppetmaster2001:/var/run/confd-template] $  sudo rm .git-ssh*.err
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18373 and previous config saved to /var/cache/conftool/dbconfig/20220104-122302-marostegui.json
* 01:57 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2001.codfw.wmnet
* 12:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:749244{{!}}Create autopatroller and patroller groups on bnwiktionary (T298187)]] (duration: 00m 57s)
* 01:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
* 12:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:45 mutante: I think git-ssh6_22 is down (see alerts lvs2008/2009) due to the v6 issue from ongoing lvs maintenance. depooled in conftool
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18372 and previous config saved to /var/cache/conftool/dbconfig/20220104-121643-marostegui.json
* 01:40 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2001.codfw.wmnet
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 01:37 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 01:35 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Cite/modules/ve-cite/ve.dm.MWReferenceNode.js: Backport for [[phab:T296044|T296044]] (duration: 00m 55s)
* 12:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751168{{!}}Make reply tool available as opt-out on specieswiki (T297535)]] (duration: 00m 57s)
* 01:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:31 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:13 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751167{{!}}Make reply tool available as opt-out on metawiki (T297534)]] (duration: 00m 59s)
* 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2002.codfw.wmnet
* 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2001.codfw.wmnet
* 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1106,1154].eqiad.wmnet with reason: Maintenance
* 01:19 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1106,1154].eqiad.wmnet with reason: Maintenance
* 01:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 15 hosts with reason: Maintenance
* 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2002.codfw.wmnet
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 15 hosts with reason: Maintenance
* 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2001.codfw.wmnet
* 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2006.codfw.wmnet
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2005.codfw.wmnet
* 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2006.codfw.wmnet
* 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2005.codfw.wmnet
* 11:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2006.codfw.wmnet
* 11:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2005.codfw.wmnet
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18370 and previous config saved to /var/cache/conftool/dbconfig/20220104-114503-marostegui.json
* 00:33 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18369 and previous config saved to /var/cache/conftool/dbconfig/20220104-112959-marostegui.json
* 00:08 brennen: end of UTC late deployment training window
* 11:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 11:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 11:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18368 and previous config saved to /var/cache/conftool/dbconfig/20220104-111454-marostegui.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18367 and previous config saved to /var/cache/conftool/dbconfig/20220104-105949-marostegui.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18366 and previous config saved to /var/cache/conftool/dbconfig/20220104-105922-marostegui.json
* 10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18365 and previous config saved to /var/cache/conftool/dbconfig/20220104-105914-marostegui.json
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18364 and previous config saved to /var/cache/conftool/dbconfig/20220104-105244-marostegui.json
* 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 10:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18362 and previous config saved to /var/cache/conftool/dbconfig/20220104-104410-marostegui.json
* 10:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 10:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 10:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 10:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 10:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 10:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 10:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18360 and previous config saved to /var/cache/conftool/dbconfig/20220104-102905-marostegui.json
* 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18359 and previous config saved to /var/cache/conftool/dbconfig/20220104-101400-marostegui.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18358 and previous config saved to /var/cache/conftool/dbconfig/20220104-094920-marostegui.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18357 and previous config saved to /var/cache/conftool/dbconfig/20220104-093415-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18356 and previous config saved to /var/cache/conftool/dbconfig/20220104-091910-marostegui.json
* 09:04 dcaro: start merging puppet cleanup patches
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18355 and previous config saved to /var/cache/conftool/dbconfig/20220104-090406-marostegui.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18354 and previous config saved to /var/cache/conftool/dbconfig/20220104-085127-marostegui.json
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 08:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18353 and previous config saved to /var/cache/conftool/dbconfig/20220104-085118-marostegui.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18352 and previous config saved to /var/cache/conftool/dbconfig/20220104-083613-marostegui.json
* 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2094.codfw.wmnet with OS bullseye
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18351 and previous config saved to /var/cache/conftool/dbconfig/20220104-082306-marostegui.json
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18350 and previous config saved to /var/cache/conftool/dbconfig/20220104-082259-marostegui.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18349 and previous config saved to /var/cache/conftool/dbconfig/20220104-082109-marostegui.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18348 and previous config saved to /var/cache/conftool/dbconfig/20220104-080754-marostegui.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18347 and previous config saved to /var/cache/conftool/dbconfig/20220104-080604-marostegui.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18346 and previous config saved to /var/cache/conftool/dbconfig/20220104-080051-marostegui.json
* 08:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2094.codfw.wmnet with OS bullseye
* 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
* 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18345 and previous config saved to /var/cache/conftool/dbconfig/20220104-075249-marostegui.json
* 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 07:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 9 hosts with reason: Maintenance
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 9 hosts with reason: Maintenance
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18344 and previous config saved to /var/cache/conftool/dbconfig/20220104-074456-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18343 and previous config saved to /var/cache/conftool/dbconfig/20220104-073745-marostegui.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18342 and previous config saved to /var/cache/conftool/dbconfig/20220104-072951-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18341 and previous config saved to /var/cache/conftool/dbconfig/20220104-071446-marostegui.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18340 and previous config saved to /var/cache/conftool/dbconfig/20220104-065942-marostegui.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298316|T298316]])', diff saved to https://phabricator.wikimedia.org/P18339 and previous config saved to /var/cache/conftool/dbconfig/20220104-063714-marostegui.json
* 06:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 06:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18338 and previous config saved to /var/cache/conftool/dbconfig/20220104-042116-marostegui.json
* 04:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 04:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18337 and previous config saved to /var/cache/conftool/dbconfig/20220104-042109-marostegui.json
* 04:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18335 and previous config saved to /var/cache/conftool/dbconfig/20220104-040604-marostegui.json
* 04:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2144.codfw.wmnet
* 04:01 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
* 03:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18334 and previous config saved to /var/cache/conftool/dbconfig/20220104-035059-marostegui.json
* 03:50 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2144.codfw.wmnet
* 03:50 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
* 03:36 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2144.codfw.wmnet
* 03:36 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
* 03:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18333 and previous config saved to /var/cache/conftool/dbconfig/20220104-033555-marostegui.json
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18332 and previous config saved to /var/cache/conftool/dbconfig/20220104-015125-marostegui.json
* 01:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 01:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 01:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18331 and previous config saved to /var/cache/conftool/dbconfig/20220104-012506-marostegui.json
* 01:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18330 and previous config saved to /var/cache/conftool/dbconfig/20220104-011001-marostegui.json
* 00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18329 and previous config saved to /var/cache/conftool/dbconfig/20220104-005456-marostegui.json
* 00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18328 and previous config saved to /var/cache/conftool/dbconfig/20220104-003951-marostegui.json
* 00:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 00:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18327 and previous config saved to /var/cache/conftool/dbconfig/20220104-000947-marostegui.json


== 2021-11-18 ==
== 2022-01-03 ==
* 23:47 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18326 and previous config saved to /var/cache/conftool/dbconfig/20220103-235443-marostegui.json
* 23:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet,service=miscweb
* 23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18325 and previous config saved to /var/cache/conftool/dbconfig/20220103-233938-marostegui.json
* 23:28 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18324 and previous config saved to /var/cache/conftool/dbconfig/20220103-232433-marostegui.json
* 23:27 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:50 cwhite: manually upgrade to grafana 8 on grafana-next ([[phab:T282863|T282863]])
* 22:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18323 and previous config saved to /var/cache/conftool/dbconfig/20220103-212216-marostegui.json
* 22:48 XioNoX: asw-b-codfw> request system power-off member 7
* 21:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:44 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 21:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:28 mutante: icinga (alert1001) - manually fix IP of mw1488.mgmt (was 0.0.0.0  is: 10.65.1.26) in /etc/icinga/objects/puppet_hosts.cfg , running puppet
* 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18322 and previous config saved to /var/cache/conftool/dbconfig/20220103-212209-marostegui.json
* 22:06 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1003.eqiad.wmnet
* 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18321 and previous config saved to /var/cache/conftool/dbconfig/20220103-210704-marostegui.json
* 21:53 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1003.eqiad.wmnet
* 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18320 and previous config saved to /var/cache/conftool/dbconfig/20220103-205159-marostegui.json
* 21:50 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1004.eqiad.wmnet
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18319 and previous config saved to /var/cache/conftool/dbconfig/20220103-203654-marostegui.json
* 21:36 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1004.eqiad.wmnet
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18318 and previous config saved to /var/cache/conftool/dbconfig/20220103-185305-marostegui.json
* 21:31 XioNoX: asw-b-codfw> request system power-off member 7
* 18:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1004.eqiad.wmnet
* 18:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1003.eqiad.wmnet
* 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18317 and previous config saved to /var/cache/conftool/dbconfig/20220103-185257-marostegui.json
* 21:01 ejegg: updated payments-wiki from {{Gerrit|abb2bd9d}} -> {{Gerrit|d1d6f024}}
* 18:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18316 and previous config saved to /var/cache/conftool/dbconfig/20220103-183752-marostegui.json
* 21:00 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18315 and previous config saved to /var/cache/conftool/dbconfig/20220103-183130-marostegui.json
* 21:00 mutante: [puppetmaster2001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 18:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18314 and previous config saved to /var/cache/conftool/dbconfig/20220103-183122-marostegui.json
* 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 18:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18313 and previous config saved to /var/cache/conftool/dbconfig/20220103-182248-marostegui.json
* 20:51 dcausse: restart blazegraph on wdqs1006 (jvm stuck)
* 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18312 and previous config saved to /var/cache/conftool/dbconfig/20220103-181617-marostegui.json
* 20:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 18:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18311 and previous config saved to /var/cache/conftool/dbconfig/20220103-180743-marostegui.json
* 20:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18310 and previous config saved to /var/cache/conftool/dbconfig/20220103-180112-marostegui.json
* 20:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
* 17:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18309 and previous config saved to /var/cache/conftool/dbconfig/20220103-174608-marostegui.json
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:13 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1003.eqiad.wmnet with OS buster
* 20:43 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1088.eqiad.wmnet with OS buster
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1086.eqiad.wmnet with OS buster
* 20:31 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] (duration: 01m 03s)
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18308 and previous config saved to /var/cache/conftool/dbconfig/20220103-164652-marostegui.json
* 20:30 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 20:27 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/tests/phpunit/includes/page/PageStoreTest.php: Backport for [[phab:T295931|T295931]] (duration: 01m 03s)
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18307 and previous config saved to /var/cache/conftool/dbconfig/20220103-164645-marostegui.json
* 20:25 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/includes/page/PageStore.php: Backport for [[phab:T295931|T295931]] (duration: 01m 04s)
* 16:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:43 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1003.eqiad.wmnet with OS buster
* 20:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
* 20:01 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 16:37 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2144.codfw.wmnet
* 19:53 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1004.eqiad.wmnet
* 16:37 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
* 19:52 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1003.eqiad.wmnet
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18306 and previous config saved to /var/cache/conftool/dbconfig/20220103-163140-marostegui.json
* 19:52 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1006.eqiad.wmnet
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS buster
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1086.eqiad.wmnet with OS buster
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1087.eqiad.wmnet with OS buster
* 19:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1085.eqiad.wmnet with OS buster
* 19:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4b4c0bca9aa6bceac86f40f03ad688b9d4481c58}}: Enable DiscussionTools automatic topic subscriptions as beta feature on most wikis ([[phab:T290500|T290500]]) (duration: 01m 04s)
* 16:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1084.eqiad.wmnet with OS buster
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1088.eqiad.wmnet with OS buster
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1086.eqiad.wmnet with OS buster
* 19:13 twentyafterfour: upgrading php7.3 packages on phab1001
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18305 and previous config saved to /var/cache/conftool/dbconfig/20220103-161635-marostegui.json
* 19:07 twentyafterfour: rebooting phab2001 to apply updated php and kernel packages
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18304 and previous config saved to /var/cache/conftool/dbconfig/20220103-161232-marostegui.json
* 19:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 16:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 19:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
* 16:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18303 and previous config saved to /var/cache/conftool/dbconfig/20220103-161224-marostegui.json
* 18:57 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 16:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS buster
* 18:52 XioNoX: asw-b-codfw> request system reboot member 7 - [[phab:T295118|T295118]]
* 16:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1087.eqiad.wmnet with OS buster
* 18:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 16:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1086.eqiad.wmnet with OS buster
* 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS buster
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18302 and previous config saved to /var/cache/conftool/dbconfig/20220103-160131-marostegui.json
* 15:49 XioNoX: asw-b-codfw> request system power-off member 7 - [[phab:T295118|T295118]]
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1084.eqiad.wmnet with OS buster
* 15:39 XioNoX: lvs2007:~$ sudo service pybal stop - [[phab:T295118|T295118]]
* 15:58 vgutierrez: pool cp2029 - [[phab:T298293|T298293]]
* 15:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18301 and previous config saved to /var/cache/conftool/dbconfig/20220103-155720-marostegui.json
* 15:35 XioNoX: cr2-codfw# set interfaces et-1/0/3 disable - [[phab:T295118|T295118]]
* 15:53 moritzm: installing publicsuffix 20211207.1025-0+deb11u1 on bullseye hosts
* 15:34 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:50 moritzm: installing gmp security updates
* 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 15:43 moritzm: installing datatables.js security updates
* 15:16 hnowlan: roll restarting cassandra on codfw maps for java updates
* 15:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp2029.codfw.wmnet with reason: Swapping faulty DIMM with B1
* 15:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 15:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp2029.codfw.wmnet with reason: Swapping faulty DIMM with B1
* 14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18300 and previous config saved to /var/cache/conftool/dbconfig/20220103-154215-marostegui.json
* 14:38 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 15:41 moritzm: installing edk2 security updates
* 14:37 hnowlan: roll-restarting sessionstore for java updates
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18299 and previous config saved to /var/cache/conftool/dbconfig/20220103-152710-marostegui.json
* 14:19 moritzm: installing testvm2003
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18298 and previous config saved to /var/cache/conftool/dbconfig/20220103-151558-marostegui.json
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2003.codfw.wmnet
* 15:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 13:34 moritzm: installing pam bugfix updates on bullseye hosts
* 15:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18297 and previous config saved to /var/cache/conftool/dbconfig/20220103-151550-marostegui.json
* 13:22 moritzm: failover ganeti master in test cluster to ganeti-test2002 [[phab:T284811|T284811]]
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18296 and previous config saved to /var/cache/conftool/dbconfig/20220103-150045-marostegui.json
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:00 hashar: Restarting Gerrit primary on gerrit1001
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:59 hashar: Restarting Gerrit replica on gerrit2001
* 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudcephosd1016.wikimedia.org
* 14:46 jayme: published image docker-registry.discovery.wmnet/cfssl-issuer:0.2.0-1 - [[phab:T294560|T294560]]
* 12:23 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudcephosd1016.wikimedia.org
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18295 and previous config saved to /var/cache/conftool/dbconfig/20220103-144539-marostegui.json
* 12:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:42 XioNoX: push CR744782 "Deprecate interface-range external" to all routers
* 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18293 and previous config saved to /var/cache/conftool/dbconfig/20220103-143034-marostegui.json
* 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18292 and previous config saved to /var/cache/conftool/dbconfig/20220103-140232-marostegui.json
* 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance
* 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1025.eqiad.wmnet
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18291 and previous config saved to /var/cache/conftool/dbconfig/20220103-140221-marostegui.json
* 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1025.eqiad.wmnet
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18290 and previous config saved to /var/cache/conftool/dbconfig/20220103-134716-marostegui.json
* 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1026.eqiad.wmnet
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18289 and previous config saved to /var/cache/conftool/dbconfig/20220103-134227-marostegui.json
* 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1026.eqiad.wmnet
* 13:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 12:15 marostegui: Upgrade dbstore1007 to 10.4.22 [[phab:T290841|T290841]] [[phab:T295970|T295970]]
* 13:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 12:15 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739550{{!}}Enable Tamil (ta) Section Translation in test wiki (T294223)]] (duration: 01m 05s)
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18288 and previous config saved to /var/cache/conftool/dbconfig/20220103-133212-marostegui.json
* 12:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS buster
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18287 and previous config saved to /var/cache/conftool/dbconfig/20220103-131707-marostegui.json
* 11:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS buster
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host build2001.codfw.wmnet
* 11:29 arturo: aborrero@apt1001:~$ sudo -i reprepro export
* 12:46 moritzm: installing openjdk-11 security updates on buster
* 11:27 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS buster
* 12:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:26 arturo: aborrero@apt1001:~$ sudo -i reprepro processincoming default /srv/wikimedia/incoming/python-flask-keystone_0.2~git20201012.b5cd4da-1_amd64.changes ([[phab:T295234|T295234]])
* 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:08 arturo: run aborrero@apt1001:~$ sudo -i reprepro processincoming default
* 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 12:41 taavi: UTC morning deploys done
* 11:07 arturo: added python-flask-oslolog_0.1~git20201012.7803a46-1 to bullseye-wikimedia ([[phab:T295234|T295234]])
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18286 and previous config saved to /var/cache/conftool/dbconfig/20220103-124117-marostegui.json
* 11:06 arturo: aborrero@apt1001:~ $ for i in $(ll /srv/wikimedia/incoming/ {{!}} grep aborrero {{!}} awk -F' ' '<nowiki>{</nowiki>print $NF<nowiki>}</nowiki>') ; do rm /srv/wikimedia/incoming/$i ; done
* 12:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 11:05 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS buster
* 12:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 12:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:57 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS buster
* 12:40 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:743683{{!}}Use new class names for CentralAuth RC feed]] (duration: 00m 57s)
* 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:35 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:748305{{!}}Add towiki.ru to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T294190)]] (duration: 00m 57s)
* 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2002.codfw.wmnet with OS buster
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:17 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS buster
* 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:12 topranks: Re-pooling eqiad in DNS after completing iBGP policy changes on cr1-eqiad and cr2-eqiad [[phab:T295672|T295672]]
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:29 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:750826{{!}}Add a logo for amiwiki (T298439)]] (3/3) (duration: 00m 57s)
* 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:01 moritzm: updating perf on buster hosts
* 12:28 taavi@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:750826{{!}}Add a logo for amiwiki (T298439)]] (2/3) (duration: 00m 57s)
* 10:00 topranks: Re-enabling Equinix IXP port on cr1-eqiad following iBGP changes to address [[phab:T295650|T295650]]
* 12:26 taavi@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:750826{{!}}Add a logo for amiwiki (T298439)]] (1/3) (duration: 00m 58s)
* 09:56 ema: cp4021: repool w/ single backend experiment enabled [[phab:T288106|T288106]]
* 12:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2002.codfw.wmnet with OS buster
* 12:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:22 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:750805{{!}}Add a logo for pwnwiki (T298438)]] (3/3) (duration: 00m 57s)
* 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:41 ema: cp4021: stop ats-be and clear its cache [[phab:T288106|T288106]]
* 12:21 taavi@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:750805{{!}}Add a logo for pwnwiki (T298438)]] (2/3) (duration: 00m 57s)
* 09:35 ema: cp4021: depool to enable single backend experiment [[phab:T288106|T288106]]
* 12:20 taavi@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:750805{{!}}Add a logo for pwnwiki (T298438)]] (1/2) (duration: 00m 58s)
* 09:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1090.eqiad.wmnet with OS buster
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:32 vgutierrez: pool cp1090 (upload) running HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 12:15 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747794{{!}}Set ContentTranslationContentImportForSectionTranslation for SX (T294642)]] (duration: 00m 59s)
* 09:18 jayme: systemctl start prune-production-images.service on deneb - [[phab:T287222|T287222]]
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1090.eqiad.wmnet with OS buster
* 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:46 vgutierrez: depool cp1090 to be reimaged as cache::upload_haproxy - [[phab:T290005|T290005]]
* 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 08:45 moritzm: installing mariadb-10.3 security updates on buster (as packaged in Debian, not the wmf-internal packages)
* 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 08:27 topranks: De-pool of Eqiad seems to be ok, transit/peering/transport links changed BW profile but nothing maxed, total LVS connections steady but have shifted to codfw. Proceeding to reconfigure iBGP policy on cr1-eqiad and cr2-eqiad maually.
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18285 and previous config saved to /var/cache/conftool/dbconfig/20220103-121131-marostegui.json
* 08:01 topranks: Depooling eqiad in authdns to allow for reconfiguration of CR routers on site ([[phab:T295672|T295672]])
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:01 moritzm: installing wireshark security updates on stretch
* 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 07:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/maintenance/migrateRevisionActorTemp.php: Backport: [[gerrit:739636{{!}}maintenance: Add waitForReplication and sleep in migrateRevisionActorTemp (T275246)]] (duration: 01m 04s)
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17772 and previous config saved to /var/cache/conftool/dbconfig/20211118-073507-root.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18284 and previous config saved to /var/cache/conftool/dbconfig/20220103-120011-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17771 and previous config saved to /var/cache/conftool/dbconfig/20211118-072004-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18283 and previous config saved to /var/cache/conftool/dbconfig/20220103-115627-marostegui.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17770 and previous config saved to /var/cache/conftool/dbconfig/20211118-070620-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s2 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18282 and previous config saved to /var/cache/conftool/dbconfig/20220103-115403-marostegui.json
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17769 and previous config saved to /var/cache/conftool/dbconfig/20211118-070559-root.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18281 and previous config saved to /var/cache/conftool/dbconfig/20220103-114507-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17768 and previous config saved to /var/cache/conftool/dbconfig/20211118-070500-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18280 and previous config saved to /var/cache/conftool/dbconfig/20220103-114122-marostegui.json
* 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17767 and previous config saved to /var/cache/conftool/dbconfig/20211118-065055-root.json
* 11:37 moritzm: rebalance row_A ganeti group in codfw (to allow to eventually free 2023 of instances)
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17766 and previous config saved to /var/cache/conftool/dbconfig/20211118-064957-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18279 and previous config saved to /var/cache/conftool/dbconfig/20220103-113002-marostegui.json
* 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17765 and previous config saved to /var/cache/conftool/dbconfig/20211118-063552-root.json
* 11:29 elukey: restart cassandra-b on aqs1010 and aqs1015 (instances stuck / trashing, new cluster, not serving live traffic atm)
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17764 and previous config saved to /var/cache/conftool/dbconfig/20211118-063453-root.json
* 11:27 oblivian@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
* 06:31 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1102:3312 ([[phab:T249683|T249683]])
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18278 and previous config saved to /var/cache/conftool/dbconfig/20220103-112617-marostegui.json
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17763 and previous config saved to /var/cache/conftool/dbconfig/20211118-062048-root.json
* 11:19 oblivian@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17762 and previous config saved to /var/cache/conftool/dbconfig/20211118-061949-root.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18277 and previous config saved to /var/cache/conftool/dbconfig/20220103-111638-marostegui.json
* 06:17 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1156 ([[phab:T249683|T249683]])
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17761 and previous config saved to /var/cache/conftool/dbconfig/20211118-060446-root.json
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17760 and previous config saved to /var/cache/conftool/dbconfig/20211118-054942-root.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18276 and previous config saved to /var/cache/conftool/dbconfig/20220103-111631-marostegui.json
* 05:47 marostegui: Upgrade clouddb1014
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18275 and previous config saved to /var/cache/conftool/dbconfig/20220103-111457-marostegui.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17759 and previous config saved to /var/cache/conftool/dbconfig/20211118-053438-root.json
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18274 and previous config saved to /var/cache/conftool/dbconfig/20220103-110126-marostegui.json
* 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 due to network issues ([[phab:T295952|T295952]])', diff saved to https://phabricator.wikimedia.org/P17758 and previous config saved to /var/cache/conftool/dbconfig/20211118-050802-ladsgroup.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18273 and previous config saved to /var/cache/conftool/dbconfig/20220103-104621-marostegui.json
* 04:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 10:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
* 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2006.codfw.wmnet
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked from s2 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18272 and previous config saved to /var/cache/conftool/dbconfig/20220103-103909-marostegui.json
* 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2005.codfw.wmnet
* 10:32 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
* 01:56 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2006.codfw.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18271 and previous config saved to /var/cache/conftool/dbconfig/20220103-103116-marostegui.json
* 01:48 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2006.codfw.wmnet
* 10:22 elukey: powercycle an-worker1114 (CPU soft lockup errors in mgmt console)
* 01:47 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2005.codfw.wmnet
* 10:20 elukey: powercycle an-worker1120 (CPU soft lockup errors in mgmt console)
* 01:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:19 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host build2001.codfw.wmnet
* 01:42 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2005.codfw.wmnet
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T297094|T297094]])', diff saved to https://phabricator.wikimedia.org/P18270 and previous config saved to /var/cache/conftool/dbconfig/20220103-101116-marostegui.json
* 01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 01:35 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP - Config: [[gerrit:739633{{!}}Revert "Stop setting wgActorTableSchemaMigrationStage, no longer read in core" (T275246)]] (duration: 01m 04s)
* 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2006.codfw.wmnet with OS stretch
* 09:59 moritzm: installing ruby2.3 security updates
* 00:28 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2006.codfw.wmnet with OS stretch
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2005.codfw.wmnet with OS stretch
* 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18269 and previous config saved to /var/cache/conftool/dbconfig/20220103-093003-marostegui.json
* 00:20 ryankemper: [[phab:T290902|T290902]] Test host looks good, proceeding to rest of fleet `ryankemper@cumin1001:~$ sudo cumin -b 4 '*elastic*' 'sudo run-puppet-agent --force'`
* 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 00:18 urbanecm: UTC late B&C finished
* 09:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:24 moritzm: installing djvulibre security updates on buster
* 00:18 ryankemper: [[phab:T290902|T290902]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379; running puppet agent on arbitrary elastic host: `ryankemper@elastic1051:~$ sudo run-puppet-agent --force`
* 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 00:17 ryankemper: [[phab:T290902|T290902]] Disabling puppet across all elastic*: `ryankemper@cumin1001:~$ sudo cumin '*elastic*' 'sudo disable-puppet "Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379"'`
* 09:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 00:16 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5110fe77bb982cca82c8d474339a2b73d02c8024}}: Migrate wmfHostnames to wmgHostnames ([[phab:T45956|T45956]]) (duration: 01m 03s)
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions and logpager from s2 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18268 and previous config saved to /var/cache/conftool/dbconfig/20220103-085824-marostegui.json
* 00:12 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/brwikimedia.png and respective HD variants
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove special slaves from s2 codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18267 and previous config saved to /var/cache/conftool/dbconfig/20220103-085428-marostegui.json
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:49 moritzm: installing libpcap security updates
* 00:08 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|59c3fe66a0d140ae21f7269150a256a5e9786b24}}: Lossless optimization of the brwikimedia logo (duration: 01m 04s)
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 00:00 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2005.codfw.wmnet with OS stretch
* 08:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:25 moritzm: installing zziplib security updates
* 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 07:51 moritzm: draining primary and secondary instances off ganeti2023
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 07:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
* 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2086.codfw.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2086.codfw.wmnet with reason: Maintenance
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:02 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:750831{{!}}Full roll out of wgMaxExecutionTimeForExpensiveQueries (T297708)]], Part I (duration: 01m 20s)
* 07:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:00 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:750831{{!}}Full roll out of wgMaxExecutionTimeForExpensiveQueries (T297708)]], Part I (duration: 00m 58s)
* 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db[2077,2095].codfw.wmnet with reason: Maintenance
* 06:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db[2077,2095].codfw.wmnet with reason: Maintenance
* 04:21 Amir1: start of running populating actor in revision table on rest of sections. It will take two months to finish ([[phab:T275246|T275246]])


== 2021-11-17 ==
* 23:53 eileen: * revision {{Gerrit|8054869b}} -> {{Gerrit|b3e2a122}} (latest)
* 23:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1003.eqiad.wmnet
* 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
* 23:45 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 23:45 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor1006.eqiad.wmnet
* 23:44 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet
* 23:43 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1006.eqiad.wmnet
* 23:35 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1005.eqiad.wmnet
* 23:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:42 mutante: miscweb1002/2002 - moved /srv/deployment/scholarships to /root/ ([[phab:T243037|T243037]])
* 21:42 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 27s)
* 21:41 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
* 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:33 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.7"
* 20:23 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] (duration: 01m 03s)
* 20:22 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:42 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/export/WikiExporter.php: Backport: [[gerrit:739491{{!}}export: Ignore rev_page_id index (T285149)]] (duration: 01m 04s)
* 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e167a53cec3c3b216100bab686f28e09c424435}}: Disable local file upload on the Chinese Wikisource ([[phab:T295265|T295265]]) (duration: 01m 05s)
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b3a1d976cb1ef931c809b3670fb8c8b3f3a56e7}}: Make reply tool available as opt-out on commonswiki ([[phab:T295838|T295838]]) (duration: 01m 05s)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2042.codfw.wmnet with OS buster
* 18:57 ejegg: updated fundraising CiviCRM from {{Gerrit|9c5f0b69}} -> {{Gerrit|8054869b}}
* 18:56 vgutierrez: pool cp2042 (upload) running HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 18:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS buster
* 18:05 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:01 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 17:59 vgutierrez: depool cp2042 to be reimaged as an HAProxy cache upload node - [[phab:T290005|T290005]]
* 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 17:25 cmooney@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2002.codfw.wmnet
* 17:11 XioNoX: repool Telia eqiad-codfw transport
* 17:10 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2002.codfw.wmnet
* 16:34 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts rpki2001.codfw.wmnet
* 16:32 mutante: LDAP - added jkieserman to wmf ([[phab:T295693|T295693]])
* 16:28 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
* 16:28 XioNoX: drain Telia eqiad-codfw link
* 16:27 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts rpki2001.codfw.wmnet
* 16:21 XioNoX: move cr1-codfw<->cr2-eqdfw link to BO cable
* 16:19 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
* 16:06 XioNoX: move cr1-codfw:xe-5/3/0 to BO cable
* 16:04 XioNoX: re-enable Telia BGP on cr1-codfw
* 16:01 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 15:59 bblack: netbox: added ganeti01 and ganeti02 cluster definitions for drmrs
* 15:58 XioNoX: disable Telia BGP on cr1-codfw
* 15:55 XioNoX: move codfw-ulsfo link to break-out cable
* 15:46 mutante: restarting pybal on lvs1015
* 15:43 _joe_: restarting pybal on lvs2009
* 15:42 mutante: restarting pybal on lvs1016
* 15:39 _joe_: restarting pybal on lvs2010
* 15:35 XioNoX: drain ulsfo-codfw link
* 14:47 moritzm: installing perl bugfix updates from Bullseye point release
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights on s5 special slaves in eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17755 and previous config saved to /var/cache/conftool/dbconfig/20211117-134942-marostegui.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17754 and previous config saved to /var/cache/conftool/dbconfig/20211117-134835-marostegui.json
* 13:20 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1001-dev.eqiad.wmnet
* 13:10 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1001-dev.eqiad.wmnet
* 13:02 Lucas_WMDE: UTC morning backport+config window done
* 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 12:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:24 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739467{{!}}Enable disambiguator notifications on 6 Wikipedias (T293319)]] (duration: 01m 04s)
* 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
* 12:17 topranks: Re-pooling ulsfo after completing routing changes on cr3-ulsfo and cr4-ulsfo ([[phab:T295672|T295672]])
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
* 12:11 moritzm: failover ganeti master in test cluster to ganeti-test2003
* 12:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739391{{!}}Enable more languages for Section Translation in testwiki (T294223)]] (duration: 01m 52s)
* 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 moritzm: installing testvm2002
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17753 and previous config saved to /var/cache/conftool/dbconfig/20211117-105120-marostegui.json
* 10:45 dcausse: restarting blazegraph on wdqs1013 (jvm stuck)
* 10:45 topranks: Commencing manual config on cr3-ulsfo and cr4-ulsfo (site depooled) to reconfigure iBGP ([[phab:T295672|T295672]])
* 10:42 hnowlan: replaced all references to deploy1001 with deploy1002 in all .git/DEPLOY_HEAD directories on deploy1002:/srv/deployment
* 10:41 ema: A:cp re-enable puppet after testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ [[phab:T293879|T293879]]
* 10:37 jayme: imported wmf-certificates 0~20211110-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 10:31 ema: A:cp disable-puppet to merge and test https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ [[phab:T293879|T293879]]
* 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 10:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
* 10:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 10:14 topranks: De-pool ulsfo in DNS to allow safe reconfiguration / test of changes to CR routers iBGP ([[phab:T295672|T295672]])
* 10:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:00 moritzm: running "gnt-cluster upgrade --to 2.16" on ganeti test cluster
* 09:59 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:59 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
* 09:48 moritzm: running "gnt-cluster renew-crypto --new-cluster-certificate" on ganeti test cluster
* 09:39 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
* 09:35 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
* 09:19 _joe_: removing php 7.3 images from docker-registry.wikimedia.org
* 09:13 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
* 09:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
* 09:03 moritzm: installing ffmpeg security updates on stretch
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17752 and previous config saved to /var/cache/conftool/dbconfig/20211117-090124-root.json
* 08:56 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
* 08:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17751 and previous config saved to /var/cache/conftool/dbconfig/20211117-084621-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17750 and previous config saved to /var/cache/conftool/dbconfig/20211117-083117-root.json
* 08:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
* 08:24 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17749 and previous config saved to /var/cache/conftool/dbconfig/20211117-081613-root.json
* 08:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
* 08:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17748 and previous config saved to /var/cache/conftool/dbconfig/20211117-080110-root.json
* 07:49 elukey: restart coal, navtiming, statsv (refreshed by puppet) after https://gerrit.wikimedia.org/r/737970
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17747 and previous config saved to /var/cache/conftool/dbconfig/20211117-074606-root.json
* 07:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
* 07:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17746 and previous config saved to /var/cache/conftool/dbconfig/20211117-073102-root.json
* 07:31 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
* 07:29 elukey: `apt-get clean` on an-tool1005 to free space in the root partition
* 07:28 elukey: `sudo pkill -U jmixter` on stat100[5,8] to allow puppet to run and remove the offboarded user
* 07:22 mm